[jira] [Issue Comment Edited] (CASSANDRA-2191) Multithread across compaction buckets

Stu Hood (JIRA) Thu, 31 Mar 2011 00:18:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013837#comment-13013837
 ]


Stu Hood edited comment on CASSANDRA-2191 at 3/31/11 7:17 AM:
--------------------------------------------------------------

1. Added {{if (max < min || max < 1) return null;}} check
2. Switched to LinkedHashSet to preserve order
3. Fixed, thanks
4. The idea is that the throttling from CASSANDRA-2156 is a sufficient 
preventative measure for this, but it will be important to enable it by 
default, hopefully using the metrics from CASSANDRA-2171. I'd prefer not to 
have two tunables, but it's worth discussing.
5. CompactionManagerMBean exposes getPendingTasks and getCompletedTasks with 
entirely different meanings, so I'm not sure we'd gain anything by this
6. (see next comment)
7. Yes, it probably would, but I think that is an issue for a separate ticket. 
Usually compacting the smallest bucket first (since the files are likely to be 
hot in cache) is the biggest win (which we do): it will be very rare for higher 
buckets to be more important
8. This would probably be a good idea, for example, if you have more than 
{{max}} sstables in the minimum bucket and not enough active threads to 
parallelize the bucket. Opened CASSANDRA-2407
9. I can't think of an easy way to do this, but if you can, I'm willing

      was (Author: stuhood):
    1. Added {{if (max < min || max < 0) return null;}} check
2. Switched to LinkedHashSet to preserve order
3. Fixed, thanks
4. The idea is that the throttling from CASSANDRA-2156 is a sufficient 
preventative measure for this, but it will be important to enable it by 
default, hopefully using the metrics from CASSANDRA-2171. I'd prefer not to 
have two tunables, but it's worth discussing.
5. CompactionManagerMBean exposes getPendingTasks and getCompletedTasks with 
entirely different meanings, so I'm not sure we'd gain anything by this
6. (see next comment)
7. Yes, it probably would, but I think that is an issue for a separate ticket. 
Usually compacting the smallest bucket first (since the files are likely to be 
hot in cache) is the biggest win (which we do): it will be very rare for higher 
buckets to be more important
8. This would probably be a good idea, for example, if you have more than 
{{max}} sstables in the minimum bucket and not enough active threads to 
parallelize the bucket. Opened CASSANDRA-2407
9. I can't think of an easy way to do this, but if you can, I'm willing
  
> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
> 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of 
> sstables that existed the moment the compaction started. This means that for 
> longer running compactions (even when running as fast as possible on the 
> hardware), a very large number of new sstables might be created in the 
> meantime. We have observed this proliferation of sstables killing performance 
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing 
> larger files) when compactions in lower buckets become possible. While this 
> would likely solve the problem with read performance, it does not actually 
> help us perform compaction any faster, which is a reasonable requirement for 
> other situations.
> Instead, we need to be able to perform any compactions that are currently 
> required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2191) Multithread across compaction buckets

Reply via email to