Benedict created CASSANDRA-7203: ----------------------------------- Summary: Flush (and Compact) High Traffic Partitions Separately Key: CASSANDRA-7203 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict
An idea possibly worth exploring is the use of streaming count-min sketches to collect data over the up-time of a server to estimating the velocity of different partitions, so that high-volume partitions can be flushed separately on the assumption that they will be much smaller in number, thus reducing write amplification by permitting compaction independently of any low-velocity data. Whilst the idea is reasonably straight forward, it seems that the biggest problem here will be defining any success metric. Obviously any workload following an exponential/zipf/extreme distribution is likely to benefit from such an approach, but whether or not that would translate in real terms is another matter. -- This message was sent by Atlassian JIRA (v6.2#6252)