[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Stu Hood (JIRA) Wed, 30 Mar 2011 22:41:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013838#comment-13013838
 ]


Stu Hood commented on CASSANDRA-2191:
-------------------------------------

Regarding 6.: there is an issue related to behaviour here: thank you for 
pointing it out. As implemented in this patch, triggering a major compaction 
will start it immediately with whatever sstables aren't already active in 
compaction: this means that triggering a major compaction while another 
compaction is running will not result in a major compaction. In the past, this 
wouldn't have been kosher, because major compactions were required to clean up 
tombstones.

But now that we have the bloomfilter checking optimization for compaction, the 
new behaviour is probably sufficient: when there ''aren't'' any other 
compactions active, the "major" flag is an optimization that makes the 
bloomfilter checks unnecessary; when there ''are'' other compactions active, 
the bloomfilter check works as usual.

Thoughts?

> Multithread across compaction buckets
> -------------------------------------
>
>                 Key: CASSANDRA-2191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>              Labels: compaction
>             Fix For: 0.8
>
>         Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 
> 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
> 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt
>
>
> This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
> reasoning are different enough to open a separate issue.
> The problem with compactions currently is that they compact the set of 
> sstables that existed the moment the compaction started. This means that for 
> longer running compactions (even when running as fast as possible on the 
> hardware), a very large number of new sstables might be created in the 
> meantime. We have observed this proliferation of sstables killing performance 
> during major/high-bucketed compactions.
> One approach would be to pause compactions in upper buckets (containing 
> larger files) when compactions in lower buckets become possible. While this 
> would likely solve the problem with read performance, it does not actually 
> help us perform compaction any faster, which is a reasonable requirement for 
> other situations.
> Instead, we need to be able to perform any compactions that are currently 
> required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

Reply via email to