[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116055#comment-16116055 ]
Jay Zhuang edited comment on CASSANDRA-9988 at 8/7/17 5:05 AM: --------------------------------------------------------------- Here are the test results to search target on the different index with the following search algorithms: 1. binary search 2. exponential search 3. binary search optimized for next adjacent value (Y axis is throughput, higher is better) !9988-3tests.png|tests! Here are my takes from the results: h4. 1. binary search: Pros: * best on average and evenly distributed; * Simple code, use JAVA native API. Cons: * not optimized to search for the adjacent value. h4. 2. exponential search: Pros: * optimized to search the first and second value for {{Dir.ASC}}. Cons: * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places; * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out. I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated. h4. 3. binary search optimized for next adjacent value Pros: * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}} Cons: * Average performance is a litter bit worse than binarySearch I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch: * consistency with other cassandra code, plus it's the current behave right now for btree search; * smaller mean deviations, could be good for tp99; * clean code. But I'm fine with any solutions: | 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | | 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] | | 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] | was (Author: jay.zhuang): Here are the test results to search target on the different index with the following search algorithms: 1. binary search 2. exponential search 3. binary search optimized for next adjacent value !9988-3tests.png|tests! Here are my takes from the results: h4. 1. binary search: Pros: * best on average and evenly distributed; * Simple code, use JAVA native API. Cons: * not optimized to search for the adjacent value. h4. 2. exponential search: Pros: * optimized to search the first and second value for {{Dir.ASC}}. Cons: * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places; * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out. I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated. h4. 3. binary search optimized for next adjacent value Pros: * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}} Cons: * Average performance is a litter bit worse than binarySearch I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch: * consistency with other cassandra code, plus it's the current behave right now for btree search; * smaller mean deviations, could be good for tp99; * clean code. But I'm fine with any solutions: | 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] | | 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] | | 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] | > Introduce leaf-only iterator > ---------------------------- > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task > Reporter: Benedict > Assignee: Jay Zhuang > Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, > 9988-result3.png, 9988-result.png, 9988-test-result3.png, > 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, > 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, > trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org