[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-08-06 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659927#comment-14659927
 ] 

Branimir Lambov commented on CASSANDRA-9471:


+1, code looks good.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-08-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651724#comment-14651724
 ] 

Benedict commented on CASSANDRA-9471:
-

I've rebased this and introduced unit tests to cover the majority of 
functionality. All of which was already present but not covered.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648337#comment-14648337
 ] 

Benedict commented on CASSANDRA-9471:
-

[~aweisberg] said:
{quote}
It would be nice to see a microbenchmark and benchmark demonstrating that the 
use of BTree is a win. I am wondering how many additional instructions and 
branches are hidden inside BTree that you pay for in the common case of small 
Columns set. A BTree is more instructions for more cache locality, but in the 
common case will there be enough columns to benefit?
{quote}

If we want to go down that avenue, we should IMO simply look into specialising 
the btree iterator for the situation where we have just one leaf, and this is 
something I've considered. That's pretty much the only likely win. We don't 
really have time to do the kind of microbenchmarking you're talking about 
though, and it is of limited use anyway, since one of the main benefits 
_performancewise_ (in the common case) is improved icache\* occupancy, which 
cannot be accounted for in a microbenchmark. In the uncommon case this also 
gives us better lower bounds on behaviour, and access to a more powerfully 
expressive toolkit. For instance, it permits us to trivially deliver a more 
efficient multi-way merge (which we do very often), and that can be improved 
further still with less trivial enhancements.

Mostly, though, this patch simply guarantees us a good lower bound on 
performance, with improved code clarity. Right now I would very much expect 
performance in the common case to be a wash, especially on any of our existing 
workloads.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646593#comment-14646593
 ] 

Benedict commented on CASSANDRA-9471:
-

Patch available [here|https://github.com/belliottsmith/cassandra/tree/9471]

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613137#comment-14613137
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

I'm sorry for the back and forth, but if you haven't started working on that 
rebase, actually disregard my previous comment (might be safer to leave Columns 
alone until CASSANDRA-9705).

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613122#comment-14613122
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

Side note: if the changes to {{Columns}} are not hard to rebase, I'd personally 
be fine with just rebasing that ticket as is (without bothering splitting it in 
2 tickets) for the sake of saving you some time. At least for CASSANDRA-9705, I 
don't plan on having much of {{Columns}} going obsolete (the indexability will 
be most likely much less used but will still be handy, and we'll still rely 
heavily-ish on {{contains}} which is currently not terribly efficient). And of 
course, that still doesn't precludes from consider other implementation of 
{{Columns}} later.

Anyway, fine with whatever way you prefer, but just to say that if splitting 
into 2 tickets takes you the same time than just rebasing the whole patch, I'd 
personally just go with the second option.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613103#comment-14613103
 ] 

Benedict commented on CASSANDRA-9471:
-

bq. but ending up doing something less efficient just because it's not there

You're right, this can happen frustratingly often. OK. I'm convinced :)

I'll split out the btree-only stuff into a separate ticket.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613093#comment-14613093
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq. for the normal worries of code atrophy

Yes, that is something to take into account. However, it's also a utility 
class, one that is meant to be used a lot in the codebase. And the indexability 
code both already written. So if it "doesn't introduce significant complexity", 
it does feels like a relatively good deal. Basically, I would hate to spend 
more time pulling the already written functionality out to maybe end up someday 
having a good use of this, but ending up doing something less efficient just 
because it's not there. Besides, it's totally possible it will be used by 
{{Columns}} in the end :)

Anyway, I don't want to sound insistent, it's not that I absolutely want it. 
Just offering that maybe simply rebasing that ticket now would avoid pushing 
that to when we might be even shorter on resources than we are, doesn't 
precludes considering better alternative for {{Columns}} later, and won't waste 
all that much work if we do end up changing {{Columns}} but keep the 
indexability as "generally useful". 

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613081#comment-14613081
 ] 

Benedict commented on CASSANDRA-9471:
-

Well, performance-wise the difference is negligible. There is an extra lg(N/32) 
cost for the current implementation, which amortizes to imperceptible (and 
literally zero for small sets). The fact we don't use higher/lower/ceil/floor 
very commonly means I'm confident this extra cost is better to incur for the 
simplicity of implementation. 

The reason I say "better" is exclusively because there is a more direct 
implementation for the inequality lookups. If we don't have _another_ reason 
for indexing it seems better practice to implement that directly, and leave out 
the indexing feature.

The indexability is actually surprisingly simple, and doesn't introduce 
significant complexity IMO. I'm just a little wary of introducing features we 
don't use _directly_ (even if I have an attachment to it), for the normal 
worries of code atrophy. I certainly won't argue against its inclusion, though, 
as I agree it seems like it _should_ be more generally useful. I'm just not yet 
aware of another place for it.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613065#comment-14613065
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq.  If we choose not to include this feature, it would be better to implement 
these directly

How much better? Thinking out loud here, but we're a database, we're dealing 
with sorted stuff all the time. So even outside of its use (or not) by 
{{Columns}}, having a more capable {{BtreeSet}} implementation, one that can 
act more like an efficient sorted list, feels to me like something that would 
be useful to have in our tool belt. Meaning by that it sounds from you comments 
that the  indexability does add much complexity to the 
implementation(disclaimer: I haven't looked at the patch) , so if its cost is 
really small, maybe it's worth getting the flexibility?

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613058#comment-14613058
 ] 

Benedict commented on CASSANDRA-9471:
-

Well, the decision does ultimately affect how certain features within the btree 
are implemented - or at least the cost/benefit analysis (for the reviewer as 
much as myself). Right now I've used the indexability feature to make a trivial 
implementation of lower/higher/floor/ceil, because it permits you to treat the 
whole btree as though it were an array for indexing, using binarySearch 
semantics and positional access. If we choose not to include this feature, it 
would be better to implement these directly - not onerous, of course, but I 
want to avoid burdening branimir with unnecessary review. There's also some 
intertwining on testing (using higher features to help test lower ones).

However you make a good point, and I will see what minimal set of changes I can 
extract to get the ball rolling. It's probably still pretty significant and 
helpful.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-03 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613046#comment-14613046
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq.  at the very least, the improved iterator, improved tests, and wider 
deployment of the btree are all worth incorporating.

What about moving those changes to a separate ticket (i.e. one that is not 
concerned by {{Columns}})? It's useful to trunk anyway as you says, and the 
less stuff we delay, the better. Splitting the changes related to {{Columns}} 
from the other is also more incremental in a way :)

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612250#comment-14612250
 ] 

Benedict commented on CASSANDRA-9471:
-

So, on further consideration, I think I will postpone this patch until the 
in-progress refactoring is done. The reason being that it may be there is a 
better, slightly wider-viewed, approach to improving this. If so, these 
improvements to the BTree won't go completely to waste: at the very least, the 
improved iterator, improved tests, and wider deployment of the btree are all 
worth incorporating. However the indexability - whilst very nifty, and 
relatively simple - is only really helpful while there is a distinction between 
Columns and RowDataBlock, which may not be necessary to maintain.

I have a feeling we can both create a more compact and efficient Columns 
representation. I will consider it all a bit more once the lay of the land has 
settled down.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-07-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612146#comment-14612146
 ] 

Benedict commented on CASSANDRA-9471:
-

[~blambov]: just letting you know that I'm rebasing this, which may take a 
little while. The btree-only changes (i.e. the majority of the work) are still 
good for review though.

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

2015-06-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585005#comment-14585005
 ] 

Benedict commented on CASSANDRA-9471:
-

I've pushed a branch [here|https://github.com/belliottsmith/cassandra/tree/9471]

This patch does quite a few things, since I thought we had better get them out 
of the way sooner than later:

# Introduces "indexing" to the BTree class, permitting lookup by positional 
index, and binarySearch semantics (so inequality lookups yield the position a 
binarySearch of the equivalent array would)
#* The size modulo remainder of a Leaf/Branch are flipped, so that a Branch has 
a spare slot at the end for an int[] index:
#** if missing, this occupies at most 1 bit per entry in the set (often 1/32), 
if present, it occupies at most 2.5 bits, and typically around 1 bit. Sets with 
<= 32 items incur no cost.
#* This index just contains the cumulative size of the tree preceding each 
branch key, rooted at the node in question
# Reintroduces BTreeSet, with extra methods for accessing by index\*, and 
simple implementations of missing NavigableMap methods using this new 
functionality
# Introduces a simple BTreeSet.Builder, and replaces all suitable uses of 
TreeMap in the codebase with a BTreeSet.Builder -> BTreeSet
# Rewrites btree.Cursor to make it far easier to understand, and to introduced 
IndexedSearchIterator, exposing the new indexing
# Reintroduces LongBTreeTest, with cleaned up more exhaustive coverage of 
iterator functionality, and new NavigableMap methods

It's more code than I expected, as I didn't plan on rewriting btree.Cursor, but 
it was frankly a bit of a mess. 8099 also extends it to cover descending 
ranges, but with no test coverage, and nor did I want to reason about (my own!) 
code when inverted in this way to corroborate its behaviour. The new 
implementation is far more declarative, and I hope should be easy for just 
about anybody to read and review, and crucially to extend/modify in future.

The Cursor rewrite was also sparked by the introduction of 
IndexedSearchIterator, which was designed to drop into 
SimpleRowDataBlock.CellWriter. Unfortunately, the comments here _suggesting_ 
columns would be provided in-order was optimistic. It seems that the call-sites 
would need to at least specify if this is the case. This means that the Cursor 
rewrite is not helpful for this _yet_, but hopefully will be soon, and I think 
it was warranted by our dependence on new functionality and by its existing 
ugliness.

It is possible we could split this ticket into (1, 5.1); (2, 5.2); 3; (4, 5.3), 
but this all seemed naturally grouped together, and given it's a follow on to 
8099...

I was hoping [~blambov] could review the changes to BTree, BTreeSet and Cursor 
in particular? If [~slebresne] or [~blerer] could review the more modest places 
it touches the 8099 code that would be great.

\*A simple follow up to this ticket would be to make BTreeSet implement both 
{{Set}} _and_ {{List}}

> Columns should be backed by a BTree, not an array
> -
>
> Key: CASSANDRA-9471
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> Follow up to 8099. 
> We have pretty terrible lookup performance as the number of columns grows 
> (linear). In at least one location, this results in quadratic performance. 
> We don't however want this structure to be either any more expensive to 
> build, nor to store. Some small modifications to BTree will permit it to 
> serve here, by permitting efficient lookup by index, and calculation _of_ 
> index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)