[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Corgan updated HBASE-9969: ------------------------------- Attachment: KeyValueHeapBenchmark_v1.ods Attaching KeyValueHeapBenchmark_v1.ods with the benchmark output for both 1 col/row and 16 cols/row. Some of it's hard to explain. Looks like LoserTree is often faster at next() when there is more heaping to do, but not when KVs are coming from the same scanner, like when numScanners=1 or colsPerRow=16. Maybe because it doesn't have an optimization for that case? Anyway, just putting it up there for people to poke holes in or continue to test. > Improve KeyValueHeap using loser tree > ------------------------------------- > > Key: HBASE-9969 > URL: https://issues.apache.org/jira/browse/HBASE-9969 > Project: HBase > Issue Type: Improvement > Components: Performance, regionserver > Reporter: Chao Shi > Assignee: Chao Shi > Fix For: 0.98.0, 0.96.1 > > Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, > hbase-9969-pq-v1.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, > hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt > > > LoserTree is the better data structure than binary heap. It saves half of the > comparisons on each next(), though the time complexity is on O(logN). > Currently A scan or get will go through two KeyValueHeaps, one is merging KVs > read from multiple HFiles in a single store, the other is merging results > from multiple stores. This patch should improve the both cases whenever CPU > is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). > All of the optimization work is done in KeyValueHeap and does not change its > public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)