[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

Jay Zhuang (JIRA) Sun, 06 Aug 2017 22:06:23 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116055#comment-16116055
 ]


Jay Zhuang edited comment on CASSANDRA-9988 at 8/7/17 5:05 AM:
---------------------------------------------------------------

Here are the test results to search target on the different index with the 
following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
(Y axis is throughput, higher is better)
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the 
throughput. {{Dir.DESC}} will have the same performance degradation and we do 
use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above 
code, it should be {{while (prob_ub <= ub && (c = 
comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size 
{{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make 
the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and 
{{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we 
search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now 
for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.

But I'm fine with any solutions:
| 1. binary search | 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | 
[9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| 
[9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |


was (Author: jay.zhuang):
Here are the test results to search target on the different index with the 
following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the 
throughput. {{Dir.DESC}} will have the same performance degradation and we do 
use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above 
code, it should be {{while (prob_ub <= ub && (c = 
comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size 
{{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make 
the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and 
{{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we 
search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now 
for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | 
[9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | 
[9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| 
[9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |

> Introduce leaf-only iterator
> ----------------------------
>
>                 Key: CASSANDRA-9988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 
> 9988-result3.png, 9988-result.png, 9988-test-result3.png, 
> 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 
> 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, 
> trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf 
> page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-9988) Introduce leaf-only iterator

Reply via email to