[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-04-30 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645837#comment-13645837
 ] 

Yuki Morishita commented on CASSANDRA-4316:
---

+1 on 1.2 patch. You need license header for new classes though.

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Jonathan Ellis
 Fix For: 1.2.5

 Attachments: 4316-1.2.txt, 4316-1.2.txt, 4316-1.2-v2.txt, 4316-v3.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-04-27 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643558#comment-13643558
 ] 

Jonathan Ellis commented on CASSANDRA-4316:
---

Reminder from earlier:

bq. we [should] open a new ticket to move FST to RateLimiter and get rid of 
Throttle entirely

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Jonathan Ellis
 Fix For: 1.2.5

 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt, 4316-v3.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-04-26 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643528#comment-13643528
 ] 

Jonathan Ellis commented on CASSANDRA-4316:
---

I was hoping we could get rid of RandomAccessReader, but that's not realistic; 
we need to be able to re-use readers for multiple operations for performance 
(CASSANDRA-4942), which implies being able to seek, which basically leaves us 
with RAR.

Also, if we throttle at the buffering level, throttling RAR is not really any 
more complex than a SequentialReader would be.

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 2.0

 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-01-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561320#comment-13561320
 ] 

Jonathan Ellis commented on CASSANDRA-4316:
---

Not thrilled about pushing this down into RAR.  Feels like we're violating 
encapsulation too much.

What if we did this?
# Make compaction truly single-pass (there's a ticket for this somewhere)
# Introduce a SequentialReader to replace RAR
# Subclass SR to CompactionReader and add throttling there

(I suppose we could subclass RAR already for CompactionReader, but since I want 
to do #1 and #2 anyway, it would be less work to do it in this order... :)

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1

 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-01-22 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559979#comment-13559979
 ] 

Yuki Morishita commented on CASSANDRA-4316:
---

If we are compaction 32 SSTables at once, we are currently throttling every 32 
rows read. If we are using LazilyCompactedRow for wide rows, the row is read 
twice to write out compacted row (for now). And I don't know if that N = 1MB / 
average row size above fits well for various row sizes. 1MB sounds like the 
same as 1000 in current % 1000 approach.
So instead pushing down throttling at SSTableScanner level, we can throttle 
more accurately.
Maybe we can go further and use RateLimiter in RandomAccessReader for more 
accuracy.

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1

 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-01-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13554841#comment-13554841
 ] 

Jonathan Ellis commented on CASSANDRA-4316:
---

Hmm... taking a step back for a minute.  Since the minimum unit of throttling 
is still a row, do we win anything with the extra complexity of pushing it down 
into SSTableScanner?  Why not just leave it in CompactionIterable loop, w/ 
adaptive checking based on row size as above?

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1

 Attachments: 4316-1.2.txt, 4316-1.2-v2.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2013-01-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549228#comment-13549228
 ] 

Jonathan Ellis commented on CASSANDRA-4316:
---

Looks like mayThrottle (both implementations) is missing conversion from bytes 
- MB.  (Actually looking at the RateLimiter creation, it looks like the name 
{{dataSizeInMB}} is misleading since it is actually still bytes.)

Does StandaloneScrubber ratelimit?  It probably shouldn't.

Scanner in cleanup compaction needs a withRateLimit.

Is looping over scanner.getCurrentPosition for each row compacted going to eat 
CPU?  Maybe every N rows would be better, with N = 1MB / average row size.  
Quite possibly it's not actually a problem and I'm prematurely complexifying.

Nits:
- // throttle if needed comment is redundant
- maybeThrottle is a better method name than mayThrottle
- May be able to simplify getCompactionRateLimiter by creating a default 
limiter even if we are unthrottled (since if we are unthrottled we ignore it 
anyway), so you don't need to worry about == null checks

Rest LGTM.  Should we open a new ticket to move FST to RateLimiter and get rid 
of Throttle entirely?


 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1

 Attachments: 4316-1.2.txt


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2012-12-20 Thread Alex Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537464#comment-13537464
 ] 

Alex Tang commented on CASSANDRA-4316:
--

This bug is marked as fixed in 1.2.1, however I don't see any fixes being 
applied in the current trunk (as of 2012-12-20) for this bug.  Also, the code 
that the OP put into the issue (CompactionIterable:116) doesn't seem to have 
been implemented either.  Can someone explain how this was fixed?

Thanks.

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2012-12-20 Thread Alex Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537538#comment-13537538
 ] 

Alex Tang commented on CASSANDRA-4316:
--

facepalm.  Sorry about that. I misread it. Thanks.  

BTW, do you have an idea on what the method of fix is?  Is it going to be the 
OP's suggestion to change how often the compaction happens or to perform a more 
complex calculation to make sleeps more consistent?

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows

2012-12-20 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537558#comment-13537558
 ] 

Yuki Morishita commented on CASSANDRA-4316:
---

I'm thinking of doing throttling in other place(possibly LazilyCompactedRow) 
especially for wide rows.

 Compaction Throttle too bursty with large rows
 --

 Key: CASSANDRA-4316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4316
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0
Reporter: Wayne Lewis
Assignee: Yuki Morishita
 Fix For: 1.2.1


 In org.apache.cassandra.db.compaction.CompactionIterable the check for 
 compaction throttling occurs once every 1000 rows. In our workload this is 
 much too large as we have many large rows (16 - 100 MB).
 With a 100 MB row, about 100 GB is read (and possibly written) before the 
 compaction throttle sleeps. This causes bursts of essentially unthrottled 
 compaction IO followed by a long sleep which yields inconsistence performance 
 and high error rates during the bursts.
 We applied a workaround to check throttle every row which solved our 
 performance and error issues:
 line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
 if ((row++ % 1000) == 0)
 replaced with
 if ((row++ % 1) == 0)
 I think the better solution is to calculate how often throttle should be 
 checked based on the throttle rate to apply sleeps more consistently. E.g. if 
 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps 
 are spaced out about every second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira