[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597244#comment-16597244 ] Benedict commented on CASSANDRA-9989: - So, when I suggested pre-computing, I meant pre-computing the split point (not still performing a separate split for the last two nodes, but having an if statement in the loop, that decrements the childSize by 1 on a selected node) But, I'm happy to stop here - we can leave a TODO over the size calculation in the loop, to make revisiting easier should this method ever become a hotspot. It's a huge improvement over the current state, so let's get it in before the freeze. Thanks for bearing with all my review feedback! > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597080#comment-16597080 ] Jay Zhuang commented on CASSANDRA-9989: --- I rebased and squashed the commits: |Branch|uTest| | [9989-rebased|https://github.com/cooldoger/cassandra/tree/9989-rebased] | [!https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased] | > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596997#comment-16596997 ] Jay Zhuang commented on CASSANDRA-9989: --- There's a little bit improvement when we pre-compute the child size and split the left values to the last 2 nodes: [{{9989-2}}|https://github.com/cooldoger/cassandra/tree/9989-2]. Here is my benchmark test results (it should only impact the large bTree build): {noformat} == calculate child size every round: [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 124595.864 ? 10133.336 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 120228.601 ? 12859.617 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 103881.001 ? 8136.400 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 89141.480 ? 7716.011 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 67390.602 ? 8057.348 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 16 19633.234 ? 1545.773 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 16 10334.557 ? 1027.898 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 161239.163 ? 173.303 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 104.024 ? 12.069 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 10.259 ? 1.088 ops/ms == pre-calculate child size and split the left values to the last 2 nodes: [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 122030.330 ? 10528.782 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 121939.935 ? 12627.014 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 104694.942 ? 9031.935 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 87687.949 ? 9029.432 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 67941.722 ? 7099.874 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 16 19468.380 ? 1640.993 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 16 10503.954 ? 980.228 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 161374.558 ? 167.329 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 111.364 ? 8.896 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 10.728 ? 1.107 ops/ms {noformat} I would prefer the clearer code. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596964#comment-16596964 ] Benedict commented on CASSANDRA-9989: - Much nicer, conceptually and aesthetically. It's a shame it means an extra integer division on each iteration; the latency for this varies between 14-30 cycles on latest CPUs. Does it have a measurable impact on the performance of the patch? Ordinarily I'd try to avoid it, but it makes the code a lot clearer. The alternative is to precompute an index we will switch from rounding-up to rounding-down. This will definitely be cheaper, but slightly less clean. If the microbenchmarks show no big downside, I'm +1 the patch whichever way you prefer. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596957#comment-16596957 ] Jay Zhuang commented on CASSANDRA-9989: --- Nice catch. I updated the branch to split the left values to the left child nodes. Please review again: |branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]| > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596229#comment-16596229 ] Benedict commented on CASSANDRA-9989: - Thanks Jay. Unfortunately this breaks the {{isWellFormed}} test, and I've added a special case to {{LongBTreeTest}} to check this. In the case of a tree of size 20, we produce a child node that is smaller than {{FAN_FACTOR / 2}}. To avoid this, we need to calculate a mid-point where we decrement the size, which is just a cleaner generalisation of splitting the final two nodes (and instead of two loops, we can have a single conditional move inside the loop, to subtract on the correct child) > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595399#comment-16595399 ] Jay Zhuang commented on CASSANDRA-9989: --- Thanks [~benedict]. {quote} 1. It might be nice to rename pos to index for consistency with indexOffsets {quote} Changed. {quote} 2. It might be nicer to split even more evenly, as far as possible - if only from a code perspective. The situation you're accounting for of a single key in the final child could be resolved by decrementing some K from every other node, I think. {quote} Make sense to me. It could be done by: {noformat} childSize = (size - 1) / childNum; {noformat} instead of {noformat} childSize = size / childNum; {noformat} The code is also simpler. Please review: |branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]| > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594817#comment-16594817 ] Benedict commented on CASSANDRA-9989: - Thanks Jay, looks good. A couple more follow-up comments: # It might be nice to rename {{pos}} to {{index}} for consistency with {{indexOffsets}} # It might be nicer to split even more evenly, as far as possible - if only from a code perspective. The situation you're accounting for of a single key in the final child could be resolved by decrementing some K from every other node, I think. This second is only a suggestion, as I haven't tried to implement it, but I feel it would be a little more consistent to have {{[N-k,N-k,…,N-k,remainder]}} than to have {{[N,N...,N,remainder/2, remainder/2]}}. It would also mean a single clearly explained bit of upfront logic to decide if we need to adjust, and the loop can apply to all nodes but the last. As a simple ‘proof’, there will be max {{BF+1}} children, so a max of {{BF}} neighbours. So deducting 1 from all of them, in the event that the right-hand node is going to have a solitary key, will increase its size by at most 1 leaf node. This is guaranteed to fit unless the node was already a leaf, in which case we didn’t need to do anything anyway. The only scenario I can imagine being a problem would be a deep tree, for which we probably want to calculate {{MIN_TREE_SIZE}}, as we might need to deduct more than 1. WDYT? bq. LongBTreeTest.java is timing out even for trunk, I'm increasing timeout and run through the test. You can run this with its {{main()}} method > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594239#comment-16594239 ] Jay Zhuang commented on CASSANDRA-9989: --- Sorry, I misunderstood the last part. I updated the patch to evenly split the values to all child nodes. The TREE_SIZE is starting from index 0 instead of 1. {{left}} is replaced with an incrementing counter, please review: |branch | [9989|https://github.com/cooldoger/cassandra/tree/9989]| {{LongBTreeTest.java}} is timing out even for trunk, I'm increasing timeout and run through the test. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591256#comment-16591256 ] Benedict commented on CASSANDRA-9989: - bq. We can make it more clear with: {{(size / (childSize + 1)) + 1}} Ha. Doh. bq. It's also used here I guess we should decide on our split strategy then, as this code is new, and you added it AFAICT because of miscommunication on my part. My view is that either your prior code (that is *conceptually* consistent with the old approach), or an even split between all of the child nodes, is preferable. I think the latter is actually superior, as it provides more opportunities on update to copy an unadulterated sub-tree (with full nodes we have to split, spilling up to the root). Either way, if we remove the legacy replication, this grandchild calculation goes away, and we can clarify TREE_SIZE? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591169#comment-16591169 ] Jay Zhuang commented on CASSANDRA-9989: --- Sorry, I missed {{point 1.}} (I thought it's something else), but you already get it: {noformat} int childrenNum = (size + childSize + 1) / (childSize +1) ==> int childrenNum = size / (childSize + 1) + (childSize + 1) / (childSize + 1) ==> int childrenNum = size / (childSize + 1) + 1 {noformat} We can make it more clear with: {{(size / (childSize + 1)) + 1}}. {quote}I think it might be nicer in this case to make the TREE_SIZE logically more obvious, so that its accessors (which are rather more complicated) are simplified, rather than its calculation? I don't think this is very tricky anyway - just set TREE_SIZE[0] = FAN_FACTOR, and leave the loop as it is, I think? {quote} It's also used here: [{{TREE_SIZE[level-2]}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196]. Which needs to be changed to: {{int grandchildSize = level == 1 ? 0 : TREE_SIZE[level - 2];}}. I prefer avoiding this check while building every node (will add a comment that leaf node is level 1). But I'm fine with either way. {quote}I think this is also more easily done by an incrementing counter, rather than decrementing? {quote} Sure, I'll update the patch for that. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591004#comment-16591004 ] Benedict commented on CASSANDRA-9989: - {quote}Just to make the initial calculation easier {quote} I think it might be nicer in this case to make the {{TREE_SIZE}} logically more obvious, so that its accessors (which are rather more complicated) are simplified, rather than its calculation? I don't think this is very tricky anyway - just set {{TREE_SIZE[0] = FAN_FACTOR}}, and leave the loop as it is, I think? {quote}It's used to update indexOffsets[I]. {quote} I think this is also more easily done by an incrementing counter, rather than decrementing? {quote}I updated the code and added an unittest to make sure the BTree is exactly the same as before (with NodeBuilder()). {quote} I didn't mean to suggest that we exactly duplicate the old logic. In fact, arguably, the {{NodeBuilder}} code is not only very ugly, it is also "incorrect" - it leaves a space vacant at the end of a full {{Leaf}} node. I think this was a conscious tradeoff when implementing the {{indexOffsets}} feature, but still (note, please maintain this in your patch, as the old code is not able to handle a "full" Leaf). This is not something easily fixed, and I think it should be addressed by rewriting the {{NodeBuilder}} entirely. Obviously, not in this patch, but I would not want to bind all future implementations by this older not-so-great one. I was musing that conceptually, splitting the final two nodes (if not identically) is consistent, so easily justified. I just wonder if a more even splitting of the contents might be more useful - for instance, it might provide more opportunities for cheap inserts into the tree. I don't think this is terribly important, however. {quote}Fixed. {quote} I may have missed it, but didn't see a response to point (1), around the {{childrenNum}} calculation? For instance, my algebra came up with this calculation: {code:java} /** * (size - (childNum - 1)) / childSize <= childNum * ==> size / childSize <= childNum + ((childNum - 1) / childSize) * ==> size <= childSize * childNum + childNum - 1 * ==> size + 1 <= (childSize + 1) * childNum * ==>ceil((size + 1) / (childSize + 1)) == childNum * ==> (size + childSize + 1) / (childSize + 1) == childNum */ int childrenNum = (size + childSize + 1) / (childSize + 1); {code} Of course, this is very close to {{size / (childSize + 1) + 1}}, so perhaps it is easy to demonstrate that it is always safe to increment by one. I haven't attempted, but it would be nice to do so, to be sure this value is always a safe {{childrenNum}} calculation > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590983#comment-16590983 ] Jay Zhuang commented on CASSANDRA-9989: --- [~benedict], thank you very much for the review. I updated the branch based on your comments: [{{branch: 9989}}|https://github.com/cooldoger/cassandra/commits/9989]. {quote}1. How did you arrive at your childrenNum calculation, and are we certain it is correct? This is pretty critical for correctness, and hard to test fully, so it would be nice to have some comments justifying it. 4. It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to the correct maximum in our static block {quote} Fixed. Now it auto calculates the max height of the tree that we could build. {quote}2. Why decrement left instead of just counting up the number of values written? {quote} It's used to update [{{indexOffsets\[i\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR179]. {quote}3. Why is TREE_SIZE indexed from 1, not 0? {quote} Just to make the initial calculation easier: [{{TREE_SIZE\[i-1\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR84]. With the new patch, it's also used here: to get [{{grandchildSize}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196]. {quote}I'm also torn on the splitting of the last two nodes - this is consistent with the current NodeBuilder logic, but it does complicate the code a little versus evenly splitting the remaining size amongst all the children. {quote} I was thinking to make the tree a little bit more balanced by splitting it equally to the last 2 nodes. But yes, it also makes sense to make it the same as before. I updated the code and added an unittest to make sure the BTree is [exactly the same|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-cb7b127243f861292899bad7305217dbR592] as before (with {{[NodeBuilder()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java]}}). > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587507#comment-16587507 ] Benedict commented on CASSANDRA-9989: - Thanks Jay. The patch looks good overall, but I have few improvement comments, and some questions: # How did you arrive at your {{childrenNum}} calculation, and are we certain it is correct? This is pretty critical for correctness, and hard to test fully, so it would be nice to have some comments justifying it. # Why decrement {{left}} instead of just counting up the number of values written? # Why is TREE_SIZE indexed from 1, not 0? # It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to the correct maximum in our static block I'm also torn on the splitting of the last two nodes - this is consistent with the current {{NodeBuilder}} logic, but it does complicate the code a little versus evenly splitting the remaining size amongst all the children. I've pushed a patch [here|https://github.com/belliottsmith/cassandra/tree/9989-suggest] with some tweaks to the {{LongBTreeTest}} to stress the new code paths more more, and it would be great if we could run this against the final patch for a while, with a tweak to the parameters to increase further the ratio of tests that use this code path. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510282#comment-16510282 ] Jay Zhuang commented on CASSANDRA-9989: --- [~jasobrown] Here is the latest rebased code: | Branch | uTest | | [9989|https://github.com/cooldoger/cassandra/tree/9989] | [!https://circleci.com/gh/cooldoger/cassandra/tree/9989.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989] | [A benchmark test|https://github.com/cooldoger/cassandra/commit/048f465d9872f5645a809666aefee503f9331736] is added first, so we could run the same test ({{$ ant microbench -Dbenchmark.name=BTreeBuildBench.buildTreeTest}}) before and after the patch. Here is the test result on my host, basically it improves the bTree build ({{2x-4x}}) for a non-leaf tree ({{>32 elements}}), and no impact on leaf tree ({{<=32 elements}}) build, which is just an array, it's already optimized: {noformat} Without Fix [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 140871.759 ± 5077.103 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 135774.492 ± 6064.639 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 126986.466 ± 3699.703 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 101731.894 ± 3567.127 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 70327.305 ± 2503.299 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 168623.271 ± 986.412 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 161681.114 ± 128.078 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 16 412.908 ± 32.097 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 27.509 ± 14.482 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 4.615 ± 0.187 ops/ms {noformat} With Fix: {noformat} [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 147053.344 ± 6292.209 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 135013.312 ± 4265.301 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 122254.600 ± 3937.228 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 102739.551 ± 1937.640 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 71638.531 ± 2005.118 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 16 21514.998 ± 985.831 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 16 11495.212 ± 526.143 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 16 1469.110 ± 57.081 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 114.110 ±4.330 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 11.910 ±0.502 ops/ms {noformat} > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509969#comment-16509969 ] Jason Brown commented on CASSANDRA-9989: have you rebased? does it need to be rebased? Also, can you run the circleci tests again, just for sanity? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509939#comment-16509939 ] Jay Zhuang commented on CASSANDRA-9989: --- [~jasobrown] gently ping :) > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361739#comment-16361739 ] Jason Brown commented on CASSANDRA-9989: You are in my queue, so hopefully in the next week or sooner. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358028#comment-16358028 ] Jay Zhuang commented on CASSANDRA-9989: --- Hi [~jasobrown], thank you for volunteering to be the reviewer. Please let me know if there's any feedback. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347709#comment-16347709 ] Jason Brown commented on CASSANDRA-9989: If Alex can't get to it by next week, I can probably do it. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347671#comment-16347671 ] Nate McCall commented on CASSANDRA-9989: [~ifesdjeen] You have done some work on the BTree impl, and chance you have time for a review? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347657#comment-16347657 ] Ariel Weisberg commented on CASSANDRA-9989: --- Sorry I haven't done this yet. I did sit down to review but didn't get to the point of understanding the b-tree implementation where I was confident enough to +1. I have some hard deadlines in between now and beginning of March so I am unlikely to get to it before then. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347647#comment-16347647 ] Jay Zhuang commented on CASSANDRA-9989: --- Hi [~aweisberg], I just rebased the code and simplified the benchmark tests (also split the commit into 2 parts benchmark and the fix, so it's easier run the benchmark with and without the fix): |Branch|uTest| |[9989|https://github.com/cooldoger/cassandra/tree/9989]|[!https://circleci.com/gh/cooldoger/cassandra/tree/9989.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989]| Here is the test result without the fix: {noformat} [java] Benchmark (dataSize) (treeBuilder) Mode Cnt Score Error Units [java] BTreeBuildBench.treeBuildTest 1 btreeBuilder thrpt 6 25567.300 ? 11696.493 ops/ms [java] BTreeBuildBench.treeBuildTest 1 treeBuilderAddAll thrpt 6 8180.040 ? 1917.026 ops/ms [java] BTreeBuildBench.treeBuildTest 1 treeBuilderAdd thrpt 6 9666.550 ? 3550.584 ops/ms [java] BTreeBuildBench.treeBuildTest 10 btreeBuilder thrpt 6 18129.297 ? 4100.356 ops/ms [java] BTreeBuildBench.treeBuildTest 10 treeBuilderAddAll thrpt 6 4693.460 ? 1626.681 ops/ms [java] BTreeBuildBench.treeBuildTest 10 treeBuilderAdd thrpt 6 5309.851 ? 1837.909 ops/ms [java] BTreeBuildBench.treeBuildTest 33 btreeBuilder thrpt 6 3008.138 ? 1216.708 ops/ms [java] BTreeBuildBench.treeBuildTest 33 treeBuilderAddAll thrpt 6 1393.265 ? 328.780 ops/ms [java] BTreeBuildBench.treeBuildTest 33 treeBuilderAdd thrpt 6 1722.419 ? 218.660 ops/ms [java] BTreeBuildBench.treeBuildTest1000 btreeBuilder thrpt 6117.055 ?67.364 ops/ms [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAddAll thrpt 6 57.689 ?11.197 ops/ms [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAdd thrpt 6 53.959 ? 7.916 ops/ms [java] BTreeBuildBench.treeBuildTest5000 btreeBuilder thrpt 6 21.463 ? 1.595 ops/ms [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAddAll thrpt 6 10.972 ? 2.476 ops/ms [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAdd thrpt 6 9.951 ? 2.194 ops/ms [java] BTreeBuildBench.treeBuildTest 100 btreeBuilder thrpt 6 0.118 ? 0.054 ops/ms [java] BTreeBuildBench.treeBuildTest 100 treeBuilderAddAll thrpt 6 0.048 ? 0.016 ops/ms [java] BTreeBuildBench.treeBuildTest 100 treeBuilderAdd thrpt 6 0.045 ? 0.011 ops/ms {noformat} With fix: {noformat} [java] Benchmark (dataSize) (treeBuilder) Mode Cnt Score Error Units [java] Benchmark (dataSize) (treeBuilder) Mode Cnt Score Error Units [java] BTreeBuildBench.treeBuildTest 1 btreeBuilder thrpt 6 24546.947 ? 4740.490 ops/ms [java] BTreeBuildBench.treeBuildTest 1 treeBuilderAddAll thrpt 6 8129.421 ? 2000.655 ops/ms [java] BTreeBuildBench.treeBuildTest 1 treeBuilderAdd thrpt 6 9277.822 ? 3106.155 ops/ms [java] BTreeBuildBench.treeBuildTest 10 btreeBuilder thrpt 6 18385.382 ? 5597.276 ops/ms [java] BTreeBuildBench.treeBuildTest 10 treeBuilderAddAll thrpt 6 4213.489 ? 1055.968 ops/ms [java] BTreeBuildBench.treeBuildTest 10 treeBuilderAdd thrpt 6 5507.714 ? 2305.731 ops/ms [java] BTreeBuildBench.treeBuildTest 33 btreeBuilder thrpt 6 4396.962 ? 1882.824 ops/ms [java] BTreeBuildBench.treeBuildTest 33 treeBuilderAddAll thrpt 6 1630.346 ? 228.321 ops/ms [java] BTreeBuildBench.treeBuildTest 33 treeBuilderAdd thrpt 6 2182.774 ? 545.746 ops/ms [java] BTreeBuildBench.treeBuildTest1000 btreeBuilder thrpt 6304.032 ? 144.663 ops/ms [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAddAll thrpt 6108.038 ? 32.746 ops/ms [java] BTreeBuildBench.treeBuildTest1000 treeBuilderAdd thrpt 6 86.093 ? 18.264 ops/ms [java] BTreeBuildBench.treeBuildTest5000 btreeBuilder thrpt 6 51.912 ? 31.749 ops/ms [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAddAll thrpt 6 15.545 ?4.912 ops/ms [java] BTreeBuildBench.treeBuildTest5000 treeBuilderAdd thrpt 6 14.569 ?3.418 ops/ms [java] BTreeBuildBench.treeBuildTest 100 btr
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291542#comment-16291542 ] Ariel Weisberg commented on CASSANDRA-9989: --- Sure. It's entirely greek to me at this point so it will be a little while. I'll probably look at it before end of year though. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291221#comment-16291221 ] Jay Zhuang commented on CASSANDRA-9989: --- [~aweisberg], are you interested in reviewing it? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280609#comment-16280609 ] Jay Zhuang commented on CASSANDRA-9989: --- Hi [~Anthony Grasso], are you still interested in review the patch? I rebased the code to the latest trunk: | Branch | uTest | | [9989-trunk-onecommit|https://github.com/cooldoger/cassandra/tree/9989-trunk-onecommit] | [!https://circleci.com/gh/cooldoger/cassandra/tree/9989-trunk-onecommit.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-trunk-onecommit] | Here is the microbench without the fix: [9989-trunk-nofix|https://github.com/cooldoger/cassandra/tree/9989-trunk-nofix] > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123714#comment-16123714 ] Jay Zhuang commented on CASSANDRA-9989: --- [~Anthony Grasso] Would you please review the patch? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058528#comment-16058528 ] Nate McCall commented on CASSANDRA-9989: We have a few spare cycles coming up. Assigning to [~Anthony Grasso] for review. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Anthony Grasso >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966913#comment-15966913 ] Jay Zhuang commented on CASSANDRA-9989: --- Rebased the [patch | https://github.com/cooldoger/cassandra/commit/99795034c4a9685bbe427cce19e0d23a31cf3516]. Please review: [9989-trunk-onecommit-update| https://github.com/apache/cassandra/compare/trunk...cooldoger:9989-trunk-onecommit-update?expand=1] > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884366#comment-15884366 ] Jay Zhuang commented on CASSANDRA-9989: --- [~slebresne] Would you please review this [patch|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a]? > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847778#comment-15847778 ] Jay Zhuang commented on CASSANDRA-9989: --- Thanks Benedict for the information. I agree that BTree.Builder could be further improved, currently it copys all the data in array and build the tree. I did some improvements for [{{BTree.buildInternal()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L128], which is used by both {{BTree.Builder}} and {{BTree.build()}}. It improves the {{build()}} performance by about {{+190%}}. And it benefits Builder too (about {{+50%}}). The [patch|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a] is attached. Here are the JMH benchmark test results: || [TestCode|https://github.com/cooldoger/cassandra/commit/bf6bc14a130dae64cb859e81ad54b21d5434d46a#diff-b91288be52d83bf95ac5eb819f9fe3d4] || Before the patch || After the patch || Performance increased: {{(after - before)/before}} || | buildLeafTreeTest | 5274.457 | 13096.541 | +148% | | buildLeafTreeBuilderTest | 1763.279 | 2223.307 | +26% | | build1KValuesTreeTest | 117.937 | 371.845 | +215% | | build1KValuesTreeBuilderTest | 68.103 | 102.884 | +51% | | build5KValuesTreeTest | 19.463 | 59.202 | +204% | | build5KValuesTreeBuilderTest | 11.349 | 19.209 | +69% | | build1MValuesTreeTest | 0.104 | 0.296 | +185% | | build1MValuesTreeBuilderTest | 0.049 | 0.080 | +63% | | transformAndFilter1KTest | 29.502 | 37.432 | +27% | | transformAndFilter1MTest | 0.031 | 0.036 | +16% | {{BTree.build()}} is about 1 time faster than {{BTree.Builder}}, as Builder copies the data and then calls {{build()}} internally (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L1088). I guess that could be further improved too. Benchmark test details: No fix: {noformat} [java] Benchmark Mode Cnt Score Error Units [java] BTreeBuildBench.build1KValuesTreeBuilderTest thrpt 40 68.103 ?2.694 ops/ms [java] BTreeBuildBench.build1KValuesTreeTestthrpt 40 117.937 ? 12.761 ops/ms [java] BTreeBuildBench.build1MValuesTreeBuilderTest thrpt 40 0.049 ?0.002 ops/ms [java] BTreeBuildBench.build1MValuesTreeTestthrpt 40 0.104 ?0.007 ops/ms [java] BTreeBuildBench.build5KValuesTreeBuilderTest thrpt 40 11.349 ?0.743 ops/ms [java] BTreeBuildBench.build5KValuesTreeTestthrpt 40 19.463 ?1.489 ops/ms [java] BTreeBuildBench.buildLeafTreeBuilderTest thrpt 40 1763.279 ? 138.734 ops/ms [java] BTreeBuildBench.buildLeafTreeTestthrpt 40 5274.457 ? 391.289 ops/ms [java] BTreeBuildBench.transformAndFilter1KTest thrpt 40 29.502 ?1.548 ops/ms [java] BTreeBuildBench.transformAndFilter1MTest thrpt 40 0.031 ?0.001 ops/ms {noformat} With Fix: {noformat} [java] Benchmark Mode Cnt Score Error Units [java] BTreeBuildBench.build1KValuesTreeBuilderTest thrpt 40 102.884 ?6.888 ops/ms [java] BTreeBuildBench.build1KValuesTreeTestthrpt 40 371.845 ? 52.153 ops/ms [java] BTreeBuildBench.build1MValuesTreeBuilderTest thrpt 40 0.080 ?0.004 ops/ms [java] BTreeBuildBench.build1MValuesTreeTestthrpt 40 0.296 ?0.037 ops/ms [java] BTreeBuildBench.build5KValuesTreeBuilderTest thrpt 40 19.209 ?1.587 ops/ms [java] BTreeBuildBench.build5KValuesTreeTestthrpt 40 59.202 ?7.439 ops/ms [java] BTreeBuildBench.buildLeafTreeBuilderTest thrpt 40 2223.307 ? 179.095 ops/ms [java] BTreeBuildBench.buildLeafTreeTestthrpt 40 13096.541 ? 1879.591 ops/ms [java] BTreeBuildBench.transformAndFilter1KTest thrpt 40 37.432 ?2.371 ops/ms [java] BTreeBuildBench.transformAndFilter1MTest thrpt 40 0.036 ?0.002 ops/ms {noformat} > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful.
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843558#comment-15843558 ] Jay Zhuang commented on CASSANDRA-9989: --- Can I take this one? I guess it could be done by implementing bulkloading https://en.wikipedia.org/wiki/B-tree#Initial_construction instead of adding value one by one https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java#L135 Is that the right direction? [~benedict] Sorry to ping you again, it's appreciated if you could give me more information. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Priority: Minor > Fix For: 3.x > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)