[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232045#comment-14232045 ]
Benedict commented on CASSANDRA-7203: ------------------------------------- It wasn't intended to be an immediate focus, I just wanted an idea if such data distributions occurred to see if it might _ever_ be worth investigating. But I can see I'm fighting a losing battle! > Flush (and Compact) High Traffic Partitions Separately > ------------------------------------------------------ > > Key: CASSANDRA-7203 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Labels: compaction, performance > > An idea possibly worth exploring is the use of streaming count-min sketches > to collect data over the up-time of a server to estimating the velocity of > different partitions, so that high-volume partitions can be flushed > separately on the assumption that they will be much smaller in number, thus > reducing write amplification by permitting compaction independently of any > low-velocity data. > Whilst the idea is reasonably straight forward, it seems that the biggest > problem here will be defining any success metric. Obviously any workload > following an exponential/zipf/extreme distribution is likely to benefit from > such an approach, but whether or not that would translate in real terms is > another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)