[ https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347096#comment-14347096 ]
Sylvain Lebresne commented on CASSANDRA-8731: --------------------------------------------- I've created CASSANDRA-6936. I did mentioned the idea as I had it however and I'm pretending it's full proof, nor that I have it all figured out. I just thing there may in general be a way to put the metadata we have to reduce the amount of comparisons in some cases. > Optimise merges involving multiple clustering columns > ----------------------------------------------------- > > Key: CASSANDRA-8731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8731 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Labels: performance > Fix For: 3.0 > > > Since the new storage format is dead in the water for the moment, we should > do our best to optimise current behaviour. When merging data from multiple > sstables with multiple clustering columns, currently we must incur the full > costs of comparison for the entire matching prefix, and must heapify every > cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for > every cell we merge, where N is the number of sources we're merging. > Essentially I'm proposing a trie-based merge approach as a replacement for > the ManyToOne MergeIterator, wherein we treat each clustering component as a > tree underwhich all Cells with a common prefix occur. We then perform a tree > merge, rather than a flat merge. For byte-order fields this trie can even be > a full binary-trie (although built on the fly). The advantage here is that we > rapidly prune merges involving disjoint ranges, so that instead of always > incurring lg(N) costs on each new record, we may often incur O(1) costs. For > timeseries data, for instance, we could merge dozens of files and so long as > they were non-overlapping our CPU burden would be little more than reading > from a single file. > On top of this, we no longer incur any of the shared prefix repetition costs, > since we compare each prefix piece-wise, and only once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)