[ 
https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232045#comment-14232045
 ] 

Benedict commented on CASSANDRA-7203:
-------------------------------------

It wasn't intended to be an immediate focus, I just wanted an idea if such data 
distributions occurred to see if it might _ever_ be worth investigating. But I 
can see I'm fighting a losing battle!

> Flush (and Compact) High Traffic Partitions Separately
> ------------------------------------------------------
>
>                 Key: CASSANDRA-7203
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: compaction, performance
>
> An idea possibly worth exploring is the use of streaming count-min sketches 
> to collect data over the up-time of a server to estimating the velocity of 
> different partitions, so that high-volume partitions can be flushed 
> separately on the assumption that they will be much smaller in number, thus 
> reducing write amplification by permitting compaction independently of any 
> low-velocity data.
> Whilst the idea is reasonably straight forward, it seems that the biggest 
> problem here will be defining any success metric. Obviously any workload 
> following an exponential/zipf/extreme distribution is likely to benefit from 
> such an approach, but whether or not that would translate in real terms is 
> another matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to