Benedict created CASSANDRA-7203:
-----------------------------------

             Summary: Flush (and Compact) High Traffic Partitions Separately
                 Key: CASSANDRA-7203
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Benedict


An idea possibly worth exploring is the use of streaming count-min sketches to 
collect data over the up-time of a server to estimating the velocity of 
different partitions, so that high-volume partitions can be flushed separately 
on the assumption that they will be much smaller in number, thus reducing write 
amplification by permitting compaction independently of any low-velocity data.

Whilst the idea is reasonably straight forward, it seems that the biggest 
problem here will be defining any success metric. Obviously any workload 
following an exponential/zipf/extreme distribution is likely to benefit from 
such an approach, but whether or not that would translate in real terms is 
another matter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to