[ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828507#comment-13828507
 ] 

Chao Shi commented on HBASE-9969:
---------------------------------

bq.  Doing a quick comparison against the second minimal element is not 
possible, as the second minimal element in the loser tree is not always 
"tree\[1\]" (consider the case that the two top players meet in the 
quarterfinal in a tournament game).

To elaborate the problem more clearly, let's consider updateTop() method of 
LoserTree, which takes logN comparisons to decide the new minimal after the 
original one has gone. If we want to get the second minimal as well, we have to 
compare among the losers to the newly updated winner, which takes another logN 
comparisons (there are logN losers in the path from a leave up to the root). 
Thus it will cost 2logN in total. This is equivalent to a binary heap.

The benefit of keeping the second minimal, is where there are consecutive KVs 
from one HFile, we can compare the it against the second minimal first. If it 
is still less than the second minimal, then it is therefore the winner. This 
optimization naturally fit in the binary heap.

I wonder if we can find some data structure in the middle, or perhaps don't 
always keep the second minimal and determine it when every n calls of next.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
> KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
> hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
> kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the 
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
> read from multiple HFiles in a single store, the other is merging results 
> from multiple stores. This patch should improve the both cases whenever CPU 
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its 
> public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to