[ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943896#comment-13943896
 ] 

Lars Hofhansl commented on HBASE-9969:
--------------------------------------

A colleague of mine (Jamie Martin to give credit) implemented something similar 
for different project.

I think some of the performance issues we're seeing is that the scanners and 
the heap are not really integrated. In order to pop something of the topScanner 
we need to take the scanner out of the heap, pop the next KV, seek the scanner, 
and put it back on the heap, which now has to rejigger everything.

His solution is a simple sorted array of scanners the scanner with the next key 
is first. When we pop something of that first scanner we simply compare the 
next key with the top key of the 2nd scanner, when larger (i.e. it's the next) 
we're done, just one comparison, no movement in the array needed. If not the 
scanner is insert-sorted into the right spot.

The assumption is that that most keys come from the same scanner and in that 
case should safe a lot of overhead. It's somewhat similar to [~mcorgan] attempt 
to insert the scanner back at the beginning, but integrates the scanner with 
the "heap" and should safe overhead. It's also simple and should be easy to 
follow.

If I get time I'm going to implement this as an alternative into Matt's testing 
framework here and see how it fares.


> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.99.0
>
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
> KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
> hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
> kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the 
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
> read from multiple HFiles in a single store, the other is merging results 
> from multiple stores. This patch should improve the both cases whenever CPU 
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its 
> public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to