[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

Benedict (JIRA) Thu, 23 Aug 2018 17:51:39 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591004#comment-16591004
 ]


Benedict commented on CASSANDRA-9989:
-------------------------------------

{quote}Just to make the initial calculation easier
{quote}
I think it might be nicer in this case to make the {{TREE_SIZE}} logically more 
obvious, so that its accessors (which are rather more complicated) are 
simplified, rather than its calculation? I don't think this is very tricky 
anyway - just set {{TREE_SIZE[0] = FAN_FACTOR}}, and leave the loop as it is, I 
think?
{quote}It's used to update indexOffsets[I].
{quote}
I think this is also more easily done by an incrementing counter, rather than 
decrementing?
{quote}I updated the code and added an unittest to make sure the BTree is 
exactly the same as before (with NodeBuilder()).
{quote}
I didn't mean to suggest that we exactly duplicate the old logic. In fact, 
arguably, the {{NodeBuilder}} code is not only very ugly, it is also 
"incorrect" - it leaves a space vacant at the end of a full {{Leaf}} node. I 
think this was a conscious tradeoff when implementing the {{indexOffsets}} 
feature, but still (note, please maintain this in your patch, as the old code 
is not able to handle a "full" Leaf).

This is not something easily fixed, and I think it should be addressed by 
rewriting the {{NodeBuilder}} entirely. Obviously, not in this patch, but I 
would not want to bind all future implementations by this older not-so-great 
one. I was musing that conceptually, splitting the final two nodes (if not 
identically) is consistent, so easily justified.

I just wonder if a more even splitting of the contents might be more useful - 
for instance, it might provide more opportunities for cheap inserts into the 
tree. I don't think this is terribly important, however.
{quote}Fixed.
{quote}
I may have missed it, but didn't see a response to point (1), around the 
{{childrenNum}} calculation? For instance, my algebra came up with this 
calculation:
{code:java}
        /**
         *           (size - (childNum - 1)) / childSize <= childNum
         * ==>                          size / childSize <= childNum + 
((childNum - 1) / childSize)
         * ==>                                      size <= childSize * 
childNum + childNum - 1
         * ==>                                  size + 1 <= (childSize + 1) * 
childNum
         * ==>        ceil((size + 1) / (childSize + 1)) == childNum
         * ==>  (size + childSize + 1) / (childSize + 1) == childNum
         */
        int childrenNum = (size + childSize + 1) / (childSize + 1);
{code}
Of course, this is very close to {{size / (childSize + 1) + 1}}, so perhaps it 
is easy to demonstrate that it is always safe to increment by one. I haven't 
attempted, but it would be nice to do so, to be sure this value is always a 
safe {{childrenNum}} calculation

> Optimise BTree.Buider
> ---------------------
>
>                 Key: CASSANDRA-9989
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

Reply via email to