[ 
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan King updated CASSANDRA-1083:
---------------------------------

    Description: 
We've discovered that we are unable to tune compaction the way we want for our 
production cluster. I think the current algorithm doesn't do this as well as it 
could, since it doesn't sort the sstables by size before doing the bucketing, 
which means the tuning parameters have unpredictable results.

I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
proposal:

config operations:
 minimumCompactionThreshold
 maximumCompactionThreshold
 targetSSTableCount

The first two would mean what they currently mean: the bounds on how many 
sstables to compact in one compaction operation. The 3rd is a target for how 
many SSTables you'd like to have.

Pseudo code algorithm for determining whether or not to do a minor compaction:

{noformat} 
if sstables.length + minimumCompactionThreshold -1 > targetSSTableCount
  sort sstables from smallest to largest
  compact the up to maximumCompactionThreshold smallest tables
{noformat} 


  was:
We've discovered that we are unable to tune compaction the way we want for our 
production cluster. I think the current algorithm doesn't do this as well as it 
could, since it doesn't sort the sstables by size before doing the bucketing, 
which means the tuning parameters have unpredictable results.

I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
proposal:

config operations:
 minimumCompactionThreshold
 maximumCompactionThreshold
 targetSSTableCount

The first two would mean what they currently mean: the bounds on how many 
sstables to compact in one compaction operation. The 3rd is a target for how 
many SSTables you'd like to have.

Pseudo code algorithm for determining whether or not to do a minor compaction:

{noformat} 
if sstables.length > targetSSTableCount
 sort sstables from smallest to largest
 if sstables.length + minimumCompactionThreshold -1 > targetSSTableCount
   compact the up to maximumCompactionThreshold smallest tables
{noformat} 



> Improvement to CompactionManger's submitMinorIfNeeded
> -----------------------------------------------------
>
>                 Key: CASSANDRA-1083
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Ryan King
>            Assignee: Ryan King
>            Priority: Minor
>
> We've discovered that we are unable to tune compaction the way we want for 
> our production cluster. I think the current algorithm doesn't do this as well 
> as it could, since it doesn't sort the sstables by size before doing the 
> bucketing, which means the tuning parameters have unpredictable results.
> I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative 
> proposal:
> config operations:
>  minimumCompactionThreshold
>  maximumCompactionThreshold
>  targetSSTableCount
> The first two would mean what they currently mean: the bounds on how many 
> sstables to compact in one compaction operation. The 3rd is a target for how 
> many SSTables you'd like to have.
> Pseudo code algorithm for determining whether or not to do a minor compaction:
> {noformat} 
> if sstables.length + minimumCompactionThreshold -1 > targetSSTableCount
>   sort sstables from smallest to largest
>   compact the up to maximumCompactionThreshold smallest tables
> {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to