[ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106073#comment-16106073
 ] 

Benedict commented on CASSANDRA-9988:
-------------------------------------

Assuming your implementation is correct, here is a modified version that works
{code}
         int lb = forwards ? nextPos : lowerBound; // inclusive
         int ub = forwards ? upperBound : nextPos; // inclusive
         int bound = 0;
         int c = -1;
         while (lb + bound < ub && (c = comparator.compare(keys[lb + bound], 
key)) < 0)
             bound = Math.max(bound * 2, 1);
         if (c == 0) { return lb + bound; } // note, by checking == 0, we do 
not need to include it in bsearch, so we do not need to increment ub
        return Arrays.binarySearch(keys, lb + (bound/2), Math.min(lb + bound, 
ub), key, comparator);
{code}

This implementation as a whole could be cleaned up a little, and there are a 
variety of ways to bit-twiddle to get {{bound == 0}} on first iteration, but 
this one is at least very clear.  A "faster" variant would be:

{code}
         while (lb + (bound / 2) < ub && (c = comparator.compare(keys[lb + 
(bound / 2)], key)) < 0)
             bound *= 2;
{code}

A faster-still variant would just offset lb by -1 at the start, so there are no 
recurring costs, and restore it after the loop.  But better than this would be 
something like:

{code}
         int lb = forwards ? nextPos : lowerBound; // inclusive
         int ub = forwards ? upperBound : nextPos; // inclusive
         int c = -1;
         int probe_ub = lb;
         int probe_incr = 1;
         while (probe_ub < ub && (c = comparator.compare(keys[probe_ub], key)) 
< 0) {
             lb = probe_ub;
             probe_ub += probe_incr;
             probe_incr *= 2;
         }
         if (c == 0) { return probe_ub; }
        return Arrays.binarySearch(keys, lb, probe_ub, key, comparator);
{code}

better still
I haven't looked at the codebase in a long time, and don't have it checked out, 
so haven't consulted the branch iterator version, and don't actually recall 
precisely what algorithm we used.  I don't recall it being a particularly 
elegant approach, though.  In fact, it's possible it only compares the first 
element and then resorts to binary search;  I do not recall for sure if we 
carried exponential search over from the ArrayBackedColumns (or whatever it 
used to be called).

I will note the exponential search as it stands here probably does not behave 
very well for reverse iteration.

Also, apologies for against-style-guide variable names.  But this comment box 
is too poor an editor to revise it any further.

> Introduce leaf-only iterator
> ----------------------------
>
>                 Key: CASSANDRA-9988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: 9988-data.png, 9988-result2.png, 9988-result3.png, 
> 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 
> 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 
> 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to