[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305522#comment-14305522
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
---------------------------------------------

bq.  any two adjacent sstables will overlap a little (possibly a lot), and so 
you have to do something to unpick them

This may not be the right ticket to discuss this after all, but again, I think 
you're assuming too much. We don't absolutely have to unpick them: while two 
adjacent sstable might overlap, you'll still have many (non-adjacent) ones that 
don't overlap and this can still be used to reduce comparisons. Besides, 
provided the merge iterator knows it deals with sstable iterators (which isn't 
too hard), we could alternatively use index blocks instead of global sstable 
min/max stats. My general point is, we could use informations we have from 
sstables to limit the number of comparison done, and that might not be terribly 
hard. Anyway, I'll create a ticket. 

> Optimise merges involving multiple clustering columns
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>
> Since the new storage format is dead in the water for the moment, we should 
> do our best to optimise current behaviour. When merging data from multiple 
> sstables with multiple clustering columns, currently we must incur the full 
> costs of comparison for the entire matching prefix, and must heapify every 
> cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
> every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for 
> the ManyToOne MergeIterator, wherein we treat each clustering component as a 
> tree underwhich all Cells with a common prefix occur. We then perform a tree 
> merge, rather than a flat merge. For byte-order fields this trie can even be 
> a full binary-trie (although built on the fly). The advantage here is that we 
> rapidly prune merges involving disjoint ranges, so that instead of always 
> incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
> timeseries data, for instance, we could merge dozens of files and so long as 
> they were non-overlapping our CPU burden would be little more than reading 
> from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs, 
> since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to