[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-30 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597244#comment-16597244
 ] 

Benedict commented on CASSANDRA-9989:
-

So, when I suggested pre-computing, I meant pre-computing the split point (not 
still performing a separate split for the last two nodes, but having an if 
statement in the loop, that decrements the childSize by 1 on a selected node)

But, I'm happy to stop here - we can leave a TODO over the size calculation in 
the loop, to make revisiting easier should this method ever become a hotspot.

It's a huge improvement over the current state, so let's get it in before the 
freeze.  Thanks for bearing with all my review feedback!

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597080#comment-16597080
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

I rebased and squashed the commits:
|Branch|uTest|
| [9989-rebased|https://github.com/cooldoger/cassandra/tree/9989-rebased] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased]
 |


> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596997#comment-16596997
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

There's a little bit improvement when we pre-compute the child size and split 
the left values to the last 2 nodes: 
[{{9989-2}}|https://github.com/cooldoger/cassandra/tree/9989-2]. Here is my 
benchmark test results (it should only impact the large bTree build):

{noformat}
== calculate child size every round:
 [java] Benchmark  (dataSize)   Mode  Cnt   Score   
Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  124595.864 ? 
10133.336  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  120228.601 ? 
12859.617  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  103881.001 ? 
 8136.400  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16   89141.480 ? 
 7716.011  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   67390.602 ? 
 8057.348  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   16   19633.234 ? 
 1545.773  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   16   10334.557 ? 
 1027.898  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   161239.163 ? 
  173.303  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16 104.024 ? 
   12.069  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  10.259 ? 
1.088  ops/ms

== pre-calculate child size and split the left values to the last 2 nodes:
 [java] Benchmark  (dataSize)   Mode  Cnt   Score   
Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  122030.330 ? 
10528.782  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  121939.935 ? 
12627.014  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  104694.942 ? 
 9031.935  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16   87687.949 ? 
 9029.432  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   67941.722 ? 
 7099.874  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   16   19468.380 ? 
 1640.993  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   16   10503.954 ? 
  980.228  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   161374.558 ? 
  167.329  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16 111.364 ? 
8.896  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  10.728 ? 
1.107  ops/ms
{noformat}

I would prefer the clearer code.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596964#comment-16596964
 ] 

Benedict commented on CASSANDRA-9989:
-

Much nicer, conceptually and aesthetically.  It's a shame it means an extra 
integer division on each iteration; the latency for this varies between 14-30 
cycles on latest CPUs.  Does it have a measurable impact on the performance of 
the patch?

Ordinarily I'd try to avoid it, but it makes the code a lot clearer.  The 
alternative is to precompute an index we will switch from rounding-up to 
rounding-down.  This will definitely be cheaper, but slightly less clean.

If the microbenchmarks show no big downside, I'm +1 the patch whichever way you 
prefer.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596957#comment-16596957
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Nice catch. I updated the branch to split the left values to the left child 
nodes. Please review again:

|branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]|

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596229#comment-16596229
 ] 

Benedict commented on CASSANDRA-9989:
-

Thanks Jay.

Unfortunately this breaks the {{isWellFormed}} test, and I've added a special 
case to {{LongBTreeTest}} to check this.  In the case of a tree of size 20, we 
produce a child node that is smaller than {{FAN_FACTOR / 2}}.  To avoid this, 
we need to calculate a mid-point where we decrement the size, which is just a 
cleaner generalisation of splitting the final two nodes (and instead of two 
loops, we can have a single conditional move inside the loop, to subtract on 
the correct child)

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-28 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595399#comment-16595399
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Thanks [~benedict].

{quote}
1. It might be nice to rename pos to index for consistency with indexOffsets
{quote}
Changed.

{quote}
2. It might be nicer to split even more evenly, as far as possible - if only 
from a code perspective. The situation you're accounting for of a single key in 
the final child could be resolved by decrementing some K from every other node, 
I think.
{quote}
Make sense to me. It could be done by:
{noformat}
childSize = (size - 1) / childNum;
{noformat}
instead of
{noformat}
childSize = size / childNum;
{noformat}
The code is also simpler.

Please review:
|branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]|

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594817#comment-16594817
 ] 

Benedict commented on CASSANDRA-9989:
-

Thanks Jay, looks good.  A couple more follow-up comments:

# It might be nice to rename {{pos}} to {{index}} for consistency with 
{{indexOffsets}}
# It might be nicer to split even more evenly, as far as possible - if only 
from a code perspective.  The situation you're accounting for of a single key 
in the final child could be resolved by decrementing some K from every other 
node, I think.

This second is only a suggestion, as I haven't tried to implement it, but I 
feel it would be a little more consistent to have {{[N-k,N-k,…,N-k,remainder]}} 
than to have {{[N,N...,N,remainder/2, remainder/2]}}.  It would also mean a 
single clearly explained bit of upfront logic to decide if we need to adjust, 
and the loop can apply to all nodes but the last.

As a simple ‘proof’, there will be max {{BF+1}} children, so a max of {{BF}} 
neighbours.  So deducting 1 from all of them, in the event that the right-hand 
node is going to have a solitary key, will increase its size by at most 1 leaf 
node.  This is guaranteed to fit unless the node was already a leaf, in which 
case we didn’t need to do anything anyway.  The only scenario I can imagine 
being a problem would be a deep tree, for which we probably want to calculate 
{{MIN_TREE_SIZE}}, as we might need to deduct more than 1.

WDYT?

bq. LongBTreeTest.java is timing out even for trunk, I'm increasing timeout and 
run through the test.

You can run this with its {{main()}} method

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594239#comment-16594239
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Sorry, I misunderstood the last part. I updated the patch to evenly split the 
values to all child nodes. The TREE_SIZE is starting from index 0 instead of 1. 
{{left}} is replaced with an incrementing counter, please review:
|branch | [9989|https://github.com/cooldoger/cassandra/tree/9989]|

{{LongBTreeTest.java}} is timing out even for trunk, I'm increasing timeout and 
run through the test.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-24 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591256#comment-16591256
 ] 

Benedict commented on CASSANDRA-9989:
-

bq. We can make it more clear with: {{(size / (childSize + 1)) + 1}}

Ha. Doh.

bq. It's also used here

I guess we should decide on our split strategy then, as this code is new, and 
you added it AFAICT because of miscommunication on my part.  My view is that 
either your prior code (that is *conceptually* consistent with the old 
approach), or an even split between all of the child nodes, is preferable.  I 
think the latter is actually superior, as it provides more opportunities on 
update to copy an unadulterated sub-tree (with full nodes we have to split, 
spilling up to the root).

Either way, if we remove the legacy replication, this grandchild calculation 
goes away, and we can clarify TREE_SIZE?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-23 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591169#comment-16591169
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Sorry, I missed {{point 1.}} (I thought it's something else), but you already 
get it:
{noformat}
 int childrenNum = (size + childSize + 1) / (childSize +1)
==>  int childrenNum = size / (childSize + 1) + (childSize + 1) / (childSize + 
1)
==>  int childrenNum = size / (childSize + 1) + 1
{noformat}
We can make it more clear with: {{(size / (childSize + 1)) + 1}}.

 
{quote}I think it might be nicer in this case to make the TREE_SIZE logically 
more obvious, so that its accessors (which are rather more complicated) are 
simplified, rather than its calculation? I don't think this is very tricky 
anyway - just set TREE_SIZE[0] = FAN_FACTOR, and leave the loop as it is, I 
think?
{quote}
It's also used here: 
[{{TREE_SIZE[level-2]}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196].
 Which needs to be changed to: {{int grandchildSize = level == 1 ? 0 : 
TREE_SIZE[level - 2];}}. I prefer avoiding this check while building every node 
(will add a comment that leaf node is level 1). But I'm fine with either way.

 
{quote}I think this is also more easily done by an incrementing counter, rather 
than decrementing?
{quote}
Sure, I'll update the patch for that.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591004#comment-16591004
 ] 

Benedict commented on CASSANDRA-9989:
-

{quote}Just to make the initial calculation easier
{quote}
I think it might be nicer in this case to make the {{TREE_SIZE}} logically more 
obvious, so that its accessors (which are rather more complicated) are 
simplified, rather than its calculation? I don't think this is very tricky 
anyway - just set {{TREE_SIZE[0] = FAN_FACTOR}}, and leave the loop as it is, I 
think?
{quote}It's used to update indexOffsets[I].
{quote}
I think this is also more easily done by an incrementing counter, rather than 
decrementing?
{quote}I updated the code and added an unittest to make sure the BTree is 
exactly the same as before (with NodeBuilder()).
{quote}
I didn't mean to suggest that we exactly duplicate the old logic. In fact, 
arguably, the {{NodeBuilder}} code is not only very ugly, it is also 
"incorrect" - it leaves a space vacant at the end of a full {{Leaf}} node. I 
think this was a conscious tradeoff when implementing the {{indexOffsets}} 
feature, but still (note, please maintain this in your patch, as the old code 
is not able to handle a "full" Leaf).

This is not something easily fixed, and I think it should be addressed by 
rewriting the {{NodeBuilder}} entirely. Obviously, not in this patch, but I 
would not want to bind all future implementations by this older not-so-great 
one. I was musing that conceptually, splitting the final two nodes (if not 
identically) is consistent, so easily justified.

I just wonder if a more even splitting of the contents might be more useful - 
for instance, it might provide more opportunities for cheap inserts into the 
tree. I don't think this is terribly important, however.
{quote}Fixed.
{quote}
I may have missed it, but didn't see a response to point (1), around the 
{{childrenNum}} calculation? For instance, my algebra came up with this 
calculation:
{code:java}
/**
 *   (size - (childNum - 1)) / childSize <= childNum
 * ==>  size / childSize <= childNum + 
((childNum - 1) / childSize)
 * ==>  size <= childSize * 
childNum + childNum - 1
 * ==>  size + 1 <= (childSize + 1) * 
childNum
 * ==>ceil((size + 1) / (childSize + 1)) == childNum
 * ==>  (size + childSize + 1) / (childSize + 1) == childNum
 */
int childrenNum = (size + childSize + 1) / (childSize + 1);
{code}
Of course, this is very close to {{size / (childSize + 1) + 1}}, so perhaps it 
is easy to demonstrate that it is always safe to increment by one. I haven't 
attempted, but it would be nice to do so, to be sure this value is always a 
safe {{childrenNum}} calculation

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-23 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590983#comment-16590983
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~benedict], thank you very much for the review.

I updated the branch based on your comments: [{{branch: 
9989}}|https://github.com/cooldoger/cassandra/commits/9989].
{quote}1. How did you arrive at your childrenNum calculation, and are we 
certain it is correct? This is pretty critical for correctness, and hard to 
test fully, so it would be nice to have some comments justifying it.
 4. It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to 
the correct maximum in our static block
{quote}
Fixed. Now it auto calculates the max height of the tree that we could build.
{quote}2. Why decrement left instead of just counting up the number of values 
written?
{quote}
It's used to update 
[{{indexOffsets\[i\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR179].
{quote}3. Why is TREE_SIZE indexed from 1, not 0?
{quote}
Just to make the initial calculation easier: 
[{{TREE_SIZE\[i-1\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR84].
 With the new patch, it's also used here: to get 
[{{grandchildSize}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196].
{quote}I'm also torn on the splitting of the last two nodes - this is 
consistent with the current NodeBuilder logic, but it does complicate the code 
a little versus evenly splitting the remaining size amongst all the children.
{quote}
I was thinking to make the tree a little bit more balanced by splitting it 
equally to the last 2 nodes. But yes, it also makes sense to make it the same 
as before. I updated the code and added an unittest to make sure the BTree is 
[exactly the 
same|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-cb7b127243f861292899bad7305217dbR592]
 as before (with 
{{[NodeBuilder()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java]}}).

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-21 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587507#comment-16587507
 ] 

Benedict commented on CASSANDRA-9989:
-

Thanks Jay.  The patch looks good overall, but I have few improvement comments, 
and some questions:

# How did you arrive at your {{childrenNum}} calculation, and are we certain it 
is correct?  This is pretty critical for correctness, and hard to test fully, 
so it would be nice to have some comments justifying it.
# Why decrement {{left}} instead of just counting up the number of values 
written?
# Why is TREE_SIZE indexed from 1, not 0?
# It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to the 
correct maximum in our static block

I'm also torn on the splitting of the last two nodes - this is consistent with 
the current {{NodeBuilder}} logic, but it does complicate the code a little 
versus evenly splitting the remaining size amongst all the children.

I've pushed a patch 
[here|https://github.com/belliottsmith/cassandra/tree/9989-suggest] with some 
tweaks to the {{LongBTreeTest}} to stress the new code paths more more, and it 
would be great if we could run this against the final patch for a while, with a 
tweak to the parameters to increase further the ratio of tests that use this 
code path.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-06-12 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510282#comment-16510282
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~jasobrown] Here is the latest rebased code:
| Branch | uTest |
| [9989|https://github.com/cooldoger/cassandra/tree/9989] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/9989.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989]
 |

[A benchmark 
test|https://github.com/cooldoger/cassandra/commit/048f465d9872f5645a809666aefee503f9331736]
 is added first, so we could run the same test ({{$ ant microbench 
-Dbenchmark.name=BTreeBuildBench.buildTreeTest}}) before and after the patch. 
Here is the test result on my host, basically it improves the bTree build 
({{2x-4x}}) for a non-leaf tree ({{>32 elements}}), and no impact on leaf tree 
({{<=32 elements}}) build, which is just an array, it's already optimized:
{noformat}
Without Fix
 [java] Benchmark  (dataSize)   Mode  Cnt   Score   
   Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  140871.759 ± 
5077.103  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  135774.492 ± 
6064.639  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  126986.466 ± 
3699.703  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  101731.894 ± 
3567.127  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   70327.305 ± 
2503.299  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   168623.271 ± 
 986.412  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   161681.114 ± 
 128.078  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   16 412.908 ± 
  32.097  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  27.509 ± 
  14.482  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16   4.615 ± 
   0.187  ops/ms
{noformat}
With Fix:
{noformat}
 [java] Benchmark  (dataSize)   Mode  Cnt   
Score  Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  
147053.344 ± 6292.209  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  
135013.312 ± 4265.301  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  
122254.600 ± 3937.228  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  
102739.551 ± 1937.640  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   
71638.531 ± 2005.118  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   16   
21514.998 ±  985.831  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   16   
11495.212 ±  526.143  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   16
1469.110 ±   57.081  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16 
114.110 ±4.330  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  
11.910 ±0.502  ops/ms
{noformat}

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-06-12 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509969#comment-16509969
 ] 

Jason Brown commented on CASSANDRA-9989:


have you rebased? does it need to be rebased? Also, can you run the circleci 
tests again, just for sanity?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-06-12 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509939#comment-16509939
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~jasobrown] gently ping :)

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-02-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361739#comment-16361739
 ] 

Jason Brown commented on CASSANDRA-9989:


You are in my queue, so hopefully in the next week or sooner.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-02-08 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358028#comment-16358028
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Hi [~jasobrown], thank you for volunteering to be the reviewer. Please let me 
know if there's any feedback.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-01-31 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347709#comment-16347709
 ] 

Jason Brown commented on CASSANDRA-9989:


If Alex can't get to it by next week, I can probably do it.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-01-31 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347671#comment-16347671
 ] 

Nate McCall commented on CASSANDRA-9989:


[~ifesdjeen] You have done some work on the BTree impl, and chance you have 
time for a review?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-01-31 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347657#comment-16347657
 ] 

Ariel Weisberg commented on CASSANDRA-9989:
---

Sorry I haven't done this yet. I did sit down to review but didn't get to the 
point of understanding the b-tree implementation where I was confident enough 
to +1. I have some hard deadlines in between now and beginning of March so I am 
unlikely to get to it before then.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-01-31 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347647#comment-16347647
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Hi [~aweisberg], I just rebased the code and simplified the benchmark tests 
(also split the commit into 2 parts benchmark and the fix, so it's easier run 
the benchmark with and without the fix):
|Branch|uTest|
|[9989|https://github.com/cooldoger/cassandra/tree/9989]|[!https://circleci.com/gh/cooldoger/cassandra/tree/9989.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989]|

Here is the test result without the fix:
{noformat}
 [java] Benchmark  (dataSize)  (treeBuilder)   Mode 
 Cnt  Score   Error   Units
 [java] BTreeBuildBench.treeBuildTest   1   btreeBuilder  thrpt 
   6  25567.300 ? 11696.493  ops/ms
 [java] BTreeBuildBench.treeBuildTest   1  treeBuilderAddAll  thrpt 
   6   8180.040 ?  1917.026  ops/ms
 [java] BTreeBuildBench.treeBuildTest   1 treeBuilderAdd  thrpt 
   6   9666.550 ?  3550.584  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10   btreeBuilder  thrpt 
   6  18129.297 ?  4100.356  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10  treeBuilderAddAll  thrpt 
   6   4693.460 ?  1626.681  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10 treeBuilderAdd  thrpt 
   6   5309.851 ?  1837.909  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33   btreeBuilder  thrpt 
   6   3008.138 ?  1216.708  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33  treeBuilderAddAll  thrpt 
   6   1393.265 ?   328.780  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33 treeBuilderAdd  thrpt 
   6   1722.419 ?   218.660  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000   btreeBuilder  thrpt 
   6117.055 ?67.364  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000  treeBuilderAddAll  thrpt 
   6 57.689 ?11.197  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAdd  thrpt 
   6 53.959 ? 7.916  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000   btreeBuilder  thrpt 
   6 21.463 ? 1.595  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000  treeBuilderAddAll  thrpt 
   6 10.972 ? 2.476  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAdd  thrpt 
   6  9.951 ? 2.194  ops/ms
 [java] BTreeBuildBench.treeBuildTest 100   btreeBuilder  thrpt 
   6  0.118 ? 0.054  ops/ms
 [java] BTreeBuildBench.treeBuildTest 100  treeBuilderAddAll  thrpt 
   6  0.048 ? 0.016  ops/ms
 [java] BTreeBuildBench.treeBuildTest 100 treeBuilderAdd  thrpt 
   6  0.045 ? 0.011  ops/ms
{noformat}
With fix:
{noformat}
 [java] Benchmark  (dataSize)  (treeBuilder)   Mode 
 Cnt  Score  Error   Units
 [java] Benchmark  (dataSize)  (treeBuilder)   Mode 
 Cnt  Score  Error   Units
 [java] BTreeBuildBench.treeBuildTest   1   btreeBuilder  thrpt 
   6  24546.947 ? 4740.490  ops/ms
 [java] BTreeBuildBench.treeBuildTest   1  treeBuilderAddAll  thrpt 
   6   8129.421 ? 2000.655  ops/ms
 [java] BTreeBuildBench.treeBuildTest   1 treeBuilderAdd  thrpt 
   6   9277.822 ? 3106.155  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10   btreeBuilder  thrpt 
   6  18385.382 ? 5597.276  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10  treeBuilderAddAll  thrpt 
   6   4213.489 ? 1055.968  ops/ms
 [java] BTreeBuildBench.treeBuildTest  10 treeBuilderAdd  thrpt 
   6   5507.714 ? 2305.731  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33   btreeBuilder  thrpt 
   6   4396.962 ? 1882.824  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33  treeBuilderAddAll  thrpt 
   6   1630.346 ?  228.321  ops/ms
 [java] BTreeBuildBench.treeBuildTest  33 treeBuilderAdd  thrpt 
   6   2182.774 ?  545.746  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000   btreeBuilder  thrpt 
   6304.032 ?  144.663  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000  treeBuilderAddAll  thrpt 
   6108.038 ?   32.746  ops/ms
 [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAdd  thrpt 
   6 86.093 ?   18.264  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000   btreeBuilder  thrpt 
   6 51.912 ?   31.749  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000  treeBuilderAddAll  thrpt 
   6 15.545 ?4.912  ops/ms
 [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAdd  thrpt 
   6 14.569 ?3.418  ops/ms
 [java] BTreeBuildBench.treeBuildTest 100   btr

[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-12-14 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291542#comment-16291542
 ] 

Ariel Weisberg commented on CASSANDRA-9989:
---

Sure. It's entirely greek to me at this point so it will be a little while. 
I'll probably look at it before end of year though.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-12-14 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291221#comment-16291221
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~aweisberg], are you interested in reviewing it?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-12-06 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280609#comment-16280609
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Hi [~Anthony Grasso], are you still interested in review the patch?

I rebased the code to the latest trunk:
| Branch | uTest |
| 
[9989-trunk-onecommit|https://github.com/cooldoger/cassandra/tree/9989-trunk-onecommit]
 | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/9989-trunk-onecommit.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-trunk-onecommit]
 |

Here is the microbench without the fix:
[9989-trunk-nofix|https://github.com/cooldoger/cassandra/tree/9989-trunk-nofix]

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-08-11 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123714#comment-16123714
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~Anthony Grasso] Would you please review the patch?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-06-21 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058528#comment-16058528
 ] 

Nate McCall commented on CASSANDRA-9989:


We have a few spare cycles coming up. Assigning to [~Anthony Grasso] for 
review. 

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Anthony Grasso
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-04-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966913#comment-15966913
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Rebased the [patch | 
https://github.com/cooldoger/cassandra/commit/99795034c4a9685bbe427cce19e0d23a31cf3516].
 Please review: [9989-trunk-onecommit-update| 
https://github.com/apache/cassandra/compare/trunk...cooldoger:9989-trunk-onecommit-update?expand=1]

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-02-25 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884366#comment-15884366
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~slebresne] Would you please review this 
[patch|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a]?

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-01-31 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847778#comment-15847778
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Thanks Benedict for the information. I agree that BTree.Builder could be 
further improved, currently it copys all the data in array and build the tree.

I did some improvements for 
[{{BTree.buildInternal()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L128],
 which is used by both {{BTree.Builder}} and {{BTree.build()}}. It improves the 
{{build()}} performance by about {{+190%}}. And it benefits Builder too (about 
{{+50%}}). The 
[patch|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a]
 is attached.

Here are the JMH benchmark test results:
|| 
[TestCode|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a#diff-b91288be52d83bf95ac5eb819f9fe3d4]
 || Before the patch || After the patch || Performance increased: {{(after - 
before)/before}} ||
| buildLeafTreeTest | 5274.457 | 13096.541 | +148% |
| buildLeafTreeBuilderTest | 1763.279 | 2223.307 | +26% |
| build1KValuesTreeTest | 117.937 | 371.845 | +215% |
| build1KValuesTreeBuilderTest | 68.103 | 102.884 | +51% |
| build5KValuesTreeTest | 19.463 | 59.202 | +204% |
| build5KValuesTreeBuilderTest | 11.349 | 19.209 | +69% |
| build1MValuesTreeTest | 0.104 | 0.296 | +185% |
| build1MValuesTreeBuilderTest | 0.049 | 0.080 | +63% |
| transformAndFilter1KTest | 29.502 | 37.432 | +27% |
| transformAndFilter1MTest | 0.031 | 0.036 | +16% |

{{BTree.build()}} is about 1 time faster than {{BTree.Builder}}, as Builder 
copies the data and then calls {{build()}} internally 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L1088).
 I guess that could be further improved too.

Benchmark test details:
No fix:
{noformat}
 [java] Benchmark Mode  Cnt  
Score  Error   Units
 [java] BTreeBuildBench.build1KValuesTreeBuilderTest thrpt   40 
68.103 ?2.694  ops/ms
 [java] BTreeBuildBench.build1KValuesTreeTestthrpt   40
117.937 ?   12.761  ops/ms
 [java] BTreeBuildBench.build1MValuesTreeBuilderTest thrpt   40  
0.049 ?0.002  ops/ms
 [java] BTreeBuildBench.build1MValuesTreeTestthrpt   40  
0.104 ?0.007  ops/ms
 [java] BTreeBuildBench.build5KValuesTreeBuilderTest thrpt   40 
11.349 ?0.743  ops/ms
 [java] BTreeBuildBench.build5KValuesTreeTestthrpt   40 
19.463 ?1.489  ops/ms
 [java] BTreeBuildBench.buildLeafTreeBuilderTest thrpt   40   
1763.279 ?  138.734  ops/ms
 [java] BTreeBuildBench.buildLeafTreeTestthrpt   40   
5274.457 ?  391.289  ops/ms
 [java] BTreeBuildBench.transformAndFilter1KTest thrpt   40 
29.502 ?1.548  ops/ms
 [java] BTreeBuildBench.transformAndFilter1MTest thrpt   40  
0.031 ?0.001  ops/ms
{noformat}

With Fix:
{noformat}
 [java] Benchmark Mode  Cnt  
Score  Error   Units
 [java] BTreeBuildBench.build1KValuesTreeBuilderTest thrpt   40
102.884 ?6.888  ops/ms
 [java] BTreeBuildBench.build1KValuesTreeTestthrpt   40
371.845 ?   52.153  ops/ms
 [java] BTreeBuildBench.build1MValuesTreeBuilderTest thrpt   40  
0.080 ?0.004  ops/ms
 [java] BTreeBuildBench.build1MValuesTreeTestthrpt   40  
0.296 ?0.037  ops/ms
 [java] BTreeBuildBench.build5KValuesTreeBuilderTest thrpt   40 
19.209 ?1.587  ops/ms
 [java] BTreeBuildBench.build5KValuesTreeTestthrpt   40 
59.202 ?7.439  ops/ms
 [java] BTreeBuildBench.buildLeafTreeBuilderTest thrpt   40   
2223.307 ?  179.095  ops/ms
 [java] BTreeBuildBench.buildLeafTreeTestthrpt   40  
13096.541 ? 1879.591  ops/ms
 [java] BTreeBuildBench.transformAndFilter1KTest thrpt   40 
37.432 ?2.371  ops/ms
 [java] BTreeBuildBench.transformAndFilter1MTest thrpt   40  
0.036 ?0.002  ops/ms
{noformat}

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.

[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2017-01-27 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843558#comment-15843558
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Can I take this one?

I guess it could be done by implementing bulkloading 
https://en.wikipedia.org/wiki/B-tree#Initial_construction instead of adding 
value one by one 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java#L135
 Is that the right direction?

[~benedict] Sorry to ping you again, it's appreciated if you could give me more 
information.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Priority: Minor
> Fix For: 3.x
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)