[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2016-02-25 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167206#comment-15167206
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
-

To sum up this, one "relatively" simple thing we could try here is to change 
our merge iterator so it's clustering aware. Or more precisely, to add a new 
merge iterator specialized for when the input is a {{ClusteringPrefix}}), that 
would keep one priority queue for each level of clustering, avoiding comparing 
equal prefixes more than necessary. I suggest we start there.

> Optimise merges involving multiple clustering columns
> -
>
> Key: CASSANDRA-8731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>  Labels: compaction, performance
> Fix For: 3.x
>
>
> Since the new storage format is dead in the water for the moment, we should 
> do our best to optimise current behaviour. When merging data from multiple 
> sstables with multiple clustering columns, currently we must incur the full 
> costs of comparison for the entire matching prefix, and must heapify every 
> cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
> every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for 
> the ManyToOne MergeIterator, wherein we treat each clustering component as a 
> tree underwhich all Cells with a common prefix occur. We then perform a tree 
> merge, rather than a flat merge. For byte-order fields this trie can even be 
> a full binary-trie (although built on the fly). The advantage here is that we 
> rapidly prune merges involving disjoint ranges, so that instead of always 
> incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
> timeseries data, for instance, we could merge dozens of files and so long as 
> they were non-overlapping our CPU burden would be little more than reading 
> from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs, 
> since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-03-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347096#comment-14347096
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
-

I've created CASSANDRA-6936. I did mentioned the idea as I had it however and 
I'm pretending it's full proof, nor that I have it all figured out. I just 
thing there may in general be a way to put the metadata we have to reduce the 
amount of comparisons in some cases.

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-03-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347056#comment-14347056
 ] 

Jonathan Ellis commented on CASSANDRA-8731:
---

Sylvain, can you link your ticket to this one?

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305401#comment-14305401
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
-

bq. For timeseries data, for instance, we could merge dozens of files and so 
long as they were non-overlapping our CPU burden would be little more than 
reading from a single file.

On that specific problem, we could also use the per-sstable min/max clustering 
stats we have to avoid bothering merging cells form 2 sstable that we know 
don't overlap. It's certainly not as general as what's described above, but I 
suspect it's reasonably simple, while the more general description above feels 
a bit more involved (even though I'll admit haven't spend too much time 
thinking about how to implement it in practice). 

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305522#comment-14305522
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
-

bq.  any two adjacent sstables will overlap a little (possibly a lot), and so 
you have to do something to unpick them

This may not be the right ticket to discuss this after all, but again, I think 
you're assuming too much. We don't absolutely have to unpick them: while two 
adjacent sstable might overlap, you'll still have many (non-adjacent) ones that 
don't overlap and this can still be used to reduce comparisons. Besides, 
provided the merge iterator knows it deals with sstable iterators (which isn't 
too hard), we could alternatively use index blocks instead of global sstable 
min/max stats. My general point is, we could use informations we have from 
sstables to limit the number of comparison done, and that might not be terribly 
hard. Anyway, I'll create a ticket. 

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305410#comment-14305410
 ] 

Benedict commented on CASSANDRA-8731:
-

Correct. I plan(ned) to replace the CF objects in the new storage format. 8099 
does make a partial change to CF objects inviting, but may be a bit ambitious 
for 3.0.

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305419#comment-14305419
 ] 

Benedict commented on CASSANDRA-8731:
-

Well, the simplest implementation is very trivial: Instead of one PQ, you have 
a PQ for each level of clustering column. Really not very advanced at all.

If you want to split intra-clustering-component to make byte-order comparable 
fields more efficient, that is more involved.

That said, there is no harm in having both. It seems to me that it may be more 
involved to do what you suggest, though, especially for sstables that _do_ 
overlap, but don't overlap for their entirety. At the moment we just lump all 
our iterators together; slicing them and merging them independently actually 
sounds to me to be much more fiddly and tough to get right than the simplest 
approach outlined here.

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305480#comment-14305480
 ] 

Sylvain Lebresne commented on CASSANDRA-8731:
-

bq. If you want to split intra-clustering-component to make byte-order 
comparable fields more efficient, that is more involved.

That's really the part I was refering to. I'm totally fine with having a 
specialized mergeIterator for clustering prefixes that knows not to compare the 
same components multiple times.

bq. especially for sstables that do overlap, but don't overlap for their 
entirety

I wasn't planning to handle them. For time series withh 
DateTieredCompactionStrategy, most of your sstables will be non-overlapping so 
it would still be a win.

bq. slicing them and merging them independently actually sounds to me to be 
much more fiddly and tough to get right than the simplest approach outlined here

If by simplest approach you mean the one PQ per level of clustering, then I 
think that's really orthogonal. What I'm suggesting is really not specific to 
multiple clustering level, while the PQ-per-level-of-clustering only help those 
case. What I can agree on is that doing CASSANDRA-6936 + doing a 
trie-for-byte-order-comparable-fields would likely make my suggestion less 
useful (even though there is cases where it would still be more efficient), but 
my initial guess is that doing both those thing is more involved that what I'm 
suggesting. But not trying to shoot your idea, just mentioning a potential 
optimization that imo is not really very involved (in particular, note that 
doing what I suggest only for single partition queries would be fine imo if 
that makes it simpler).

Anyway, I would suggest limiting the scope of this issue to introduce a 
smarter merge iterator for multi-clustering (because it's easy and I doubt 
it's contentious) and open other tickets for more involved byte-fiddling 
suggestions. 

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305501#comment-14305501
 ] 

Benedict commented on CASSANDRA-8731:
-

bq. Anyway, I would suggest limiting the scope of this issue to introduce a 
smarter merge iterator for multi-clustering (because it's easy and I doubt 
it's contentious) and open other tickets for more involved byte-fiddling 
suggestions.

Yup, that seems eminently sensible.

bq. but my initial guess is that doing both those thing is more involved that 
what I'm suggesting

Potentially, yes. Although a simple binary-trie based merge would not likely be 
very hard; certainly not as hard as implementing an actual binary-trie. The 
question is probably mostly if we get byte-order comparability for common 
fields.

bq. then I think that's really orthogonal

Close enough to agreed that I won't split hairs, and you're right, your 
suggestion would work for single clustering columns as well, so long as they 
are actually disjoint. The problem is that this is still likely to be fiddlier 
than it seems you're suggesting. DTCS is not likely to be non-overlapping; any 
two adjacent sstables will overlap a little (possibly a lot), and so you have 
to do something to unpick them. You could perform separate merges for the 
overlapping ranges, and have a rolling window, but this is also non-trivial. If 
we went for implementing the DTCS++ I suggested in the original ticket, of 
course (which would be quite achievable to deliver for 3.0) then this would 
guarantee non-overlapping ranges, and this could be made significantly simpler 
as a result. Either way, we should file a separate ticket for this and discuss 
it there.



 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305342#comment-14305342
 ] 

Benedict commented on CASSANDRA-8731:
-

Regrettably I've lost all of the discussion I had with candidates about this 
approach, so don't have the mathematical analysis to hand (although it gets 
pretty complex).

[~blambov] may have the discussion I had with him; he was better at the math 
than me anyway.

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305361#comment-14305361
 ] 

Jonathan Ellis commented on CASSANDRA-8731:
---

So you're just proposing a change to the MergeIterator, not the CF objects?

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 Since the new storage format is dead in the water for the moment, we should 
 do our best to optimise current behaviour. When merging data from multiple 
 sstables with multiple clustering columns, currently we must incur the full 
 costs of comparison for the entire matching prefix, and must heapify every 
 cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
 every cell we merge, where N is the number of sources we're merging.
 Essentially I'm proposing a trie-based merge approach as a replacement for 
 the ManyToOne MergeIterator, wherein we treat each clustering component as a 
 tree underwhich all Cells with a common prefix occur. We then perform a tree 
 merge, rather than a flat merge. For byte-order fields this trie can even be 
 a full binary-trie (although built on the fly). The advantage here is that we 
 rapidly prune merges involving disjoint ranges, so that instead of always 
 incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
 timeseries data, for instance, we could merge dozens of files and so long as 
 they were non-overlapping our CPU burden would be little more than reading 
 from a single file.
 On top of this, we no longer incur any of the shared prefix repetition costs, 
 since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304604#comment-14304604
 ] 

Jonathan Ellis commented on CASSANDRA-8731:
---

I'm optimistic that the scope is limited but I could also use more description. 
:)

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

2015-02-03 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303812#comment-14303812
 ] 

Aleksey Yeschenko commented on CASSANDRA-8731:
--

It's a little bit vague. What exactly does this issue imply?

 Optimise merges involving multiple clustering columns
 -

 Key: CASSANDRA-8731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)