[ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232032#comment-14232032 ]
Jason Brown commented on CASSANDRA-7203: ---------------------------------------- bq. I was mostly hoping to get your and sankalp kohli's views on if those workload skews occur I think it's completely dependent upon an organization's systems' implementation as to what traffic actually goes to a database vs. cache vs. whatever, and I think trying to be incredibly clever here is not worth the implementation costs. Again, I'll reiterate, we have much bigger fish to fry than this. > Flush (and Compact) High Traffic Partitions Separately > ------------------------------------------------------ > > Key: CASSANDRA-7203 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7203 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Labels: compaction, performance > > An idea possibly worth exploring is the use of streaming count-min sketches > to collect data over the up-time of a server to estimating the velocity of > different partitions, so that high-volume partitions can be flushed > separately on the assumption that they will be much smaller in number, thus > reducing write amplification by permitting compaction independently of any > low-velocity data. > Whilst the idea is reasonably straight forward, it seems that the biggest > problem here will be defining any success metric. Obviously any workload > following an exponential/zipf/extreme distribution is likely to benefit from > such an approach, but whether or not that would translate in real terms is > another matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)