[jira] [Created] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately

Benedict (JIRA) Thu, 15 May 2014 01:03:07 -0700

Benedict created CASSANDRA-7203:
-----------------------------------

             Summary: Flush (and Compact) High Traffic Partitions Separately
                 Key: CASSANDRA-7203
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Benedict



An idea possibly worth exploring is the use of streaming count-min sketches to 
collect data over the up-time of a server to estimating the velocity of 
different partitions, so that high-volume partitions can be flushed separately 
on the assumption that they will be much smaller in number, thus reducing write 
amplification by permitting compaction independently of any low-velocity data.

Whilst the idea is reasonably straight forward, it seems that the biggest 
problem here will be defining any success metric. Obviously any workload 
following an exponential/zipf/extreme distribution is likely to benefit from 
such an approach, but whether or not that would translate in real terms is 
another matter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately

Reply via email to