[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645837#comment-13645837 ] Yuki Morishita commented on CASSANDRA-4316: --- +1 on 1.2 patch. You need license header for new classes though. Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Jonathan Ellis Fix For: 1.2.5 Attachments: 4316-1.2.txt, 4316-1.2.txt, 4316-1.2-v2.txt, 4316-v3.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643558#comment-13643558 ] Jonathan Ellis commented on CASSANDRA-4316: --- Reminder from earlier: bq. we [should] open a new ticket to move FST to RateLimiter and get rid of Throttle entirely Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Jonathan Ellis Fix For: 1.2.5 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt, 4316-v3.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643528#comment-13643528 ] Jonathan Ellis commented on CASSANDRA-4316: --- I was hoping we could get rid of RandomAccessReader, but that's not realistic; we need to be able to re-use readers for multiple operations for performance (CASSANDRA-4942), which implies being able to seek, which basically leaves us with RAR. Also, if we throttle at the buffering level, throttling RAR is not really any more complex than a SequentialReader would be. Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 2.0 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561320#comment-13561320 ] Jonathan Ellis commented on CASSANDRA-4316: --- Not thrilled about pushing this down into RAR. Feels like we're violating encapsulation too much. What if we did this? # Make compaction truly single-pass (there's a ticket for this somewhere) # Introduce a SequentialReader to replace RAR # Subclass SR to CompactionReader and add throttling there (I suppose we could subclass RAR already for CompactionReader, but since I want to do #1 and #2 anyway, it would be less work to do it in this order... :) Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559979#comment-13559979 ] Yuki Morishita commented on CASSANDRA-4316: --- If we are compaction 32 SSTables at once, we are currently throttling every 32 rows read. If we are using LazilyCompactedRow for wide rows, the row is read twice to write out compacted row (for now). And I don't know if that N = 1MB / average row size above fits well for various row sizes. 1MB sounds like the same as 1000 in current % 1000 approach. So instead pushing down throttling at SSTableScanner level, we can throttle more accurately. Maybe we can go further and use RateLimiter in RandomAccessReader for more accuracy. Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554841#comment-13554841 ] Jonathan Ellis commented on CASSANDRA-4316: --- Hmm... taking a step back for a minute. Since the minimum unit of throttling is still a row, do we win anything with the extra complexity of pushing it down into SSTableScanner? Why not just leave it in CompactionIterable loop, w/ adaptive checking based on row size as above? Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549228#comment-13549228 ] Jonathan Ellis commented on CASSANDRA-4316: --- Looks like mayThrottle (both implementations) is missing conversion from bytes - MB. (Actually looking at the RateLimiter creation, it looks like the name {{dataSizeInMB}} is misleading since it is actually still bytes.) Does StandaloneScrubber ratelimit? It probably shouldn't. Scanner in cleanup compaction needs a withRateLimit. Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU? Maybe every N rows would be better, with N = 1MB / average row size. Quite possibly it's not actually a problem and I'm prematurely complexifying. Nits: - // throttle if needed comment is redundant - maybeThrottle is a better method name than mayThrottle - May be able to simplify getCompactionRateLimiter by creating a default limiter even if we are unthrottled (since if we are unthrottled we ignore it anyway), so you don't need to worry about == null checks Rest LGTM. Should we open a new ticket to move FST to RateLimiter and get rid of Throttle entirely? Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 Attachments: 4316-1.2.txt In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537464#comment-13537464 ] Alex Tang commented on CASSANDRA-4316: -- This bug is marked as fixed in 1.2.1, however I don't see any fixes being applied in the current trunk (as of 2012-12-20) for this bug. Also, the code that the OP put into the issue (CompactionIterable:116) doesn't seem to have been implemented either. Can someone explain how this was fixed? Thanks. Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537538#comment-13537538 ] Alex Tang commented on CASSANDRA-4316: -- facepalm. Sorry about that. I misread it. Thanks. BTW, do you have an idea on what the method of fix is? Is it going to be the OP's suggestion to change how often the compaction happens or to perform a more complex calculation to make sleeps more consistent? Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537558#comment-13537558 ] Yuki Morishita commented on CASSANDRA-4316: --- I'm thinking of doing throttling in other place(possibly LazilyCompactedRow) especially for wide rows. Compaction Throttle too bursty with large rows -- Key: CASSANDRA-4316 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8.0 Reporter: Wayne Lewis Assignee: Yuki Morishita Fix For: 1.2.1 In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB). With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts. We applied a workaround to check throttle every row which solved our performance and error issues: line 116 in org.apache.cassandra.db.compaction.CompactionIterable: if ((row++ % 1000) == 0) replaced with if ((row++ % 1) == 0) I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira