[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9969: - Fix Version/s: (was: 2.0.0) > Improve KeyValueHeap using loser tree > - > > Key: HBASE-9969 > URL: https://issues.apache.org/jira/browse/HBASE-9969 > Project: HBase > Issue Type: Improvement > Components: Performance, regionserver >Reporter: Chao Shi >Assignee: Chao Shi > Attachments: 9969-0.94.txt, hbase-9969.patch, hbase-9969.patch, > hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, > hbase-9969-v3.patch, KeyValueHeapBenchmark_v1.ods, > KeyValueHeapBenchmark_v2.ods, kvheap-benchmark.png, kvheap-benchmark.txt > > > LoserTree is the better data structure than binary heap. It saves half of the > comparisons on each next(), though the time complexity is on O(logN). > Currently A scan or get will go through two KeyValueHeaps, one is merging KVs > read from multiple HFiles in a single store, the other is merging results > from multiple stores. This patch should improve the both cases whenever CPU > is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). > All of the optimization work is done in KeyValueHeap and does not change its > public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-9969: - Fix Version/s: (was: 0.99.0) 2.0.0 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 2.0.0 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9969: -- Fix Version/s: (was: 0.98.1) Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.99.0 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9969: -- Status: Open (was: Patch Available) Cancelling stale patch Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.1, 0.99.0 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9969: - Fix Version/s: (was: 0.96.1) 0.99.0 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.1, 0.99.0 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9969: -- Fix Version/s: (was: 0.98.0) 0.98.1 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.96.1, 0.98.1 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Corgan updated HBASE-9969: --- Attachment: hbase-9969-pq-v2.patch KeyValueHeapBenchmark_v2.ods attaching hbase-9969-pq-v2.patch and KeyValueHeapBenchmark_v2.ods * this patch will hopefully apply (previous one accidentally had BenchmarkableKeyValueHeap.java in the src/test directory) * KeyValueScannerPriorityQueue is no longer based on PriorityQueue. it's now a custom implementation with class comments explaining * KeyValueScannerHeap (based on KeyValueScannerPriorityQueue) looks to beat the existing KeyValueHeap in all cases (tested up to 32 scanners) * LoserTree is usually faster with more scanners, but still slower when many consecutive KVs come from the same scanner Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi updated HBASE-9969: Attachment: hbase-9969-v3.patch upload v3 - add an assertion as Ted mentioned - prepend a 16 bytes prefix to row keys in the benchmark (will update the benchmark result tomorrow when I have access to my devbox) Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9969: - Fix Version/s: (was: 0.94.15) Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Corgan updated HBASE-9969: --- Attachment: hbase-9969-pq-v1.patch attaching hbase-9969-pq-v1.patch * adds KeyValueScannerPriorityQueue, which is a stripped down copy of PriorityQueue that we can play with * adds KeyValueScannerHeap (almost identical to KeyValueHeap, but uses the above) * includes LoserTreeKeyValueHeap and LoserTree * each of the 3 heaps implments BenchmarkableKeyValueHeap (not complete) * enhances KeyValueHeapBenchmark to benchmark all 3 implementations * always tests 1mm KVs, no matter how many scanners * sorts the input KVs, though that doesn't seem to matter much * does a few warmup runs One problem is that this goes from 1 to 3 implementations behind the same interface which may not get inlined as well. Absolute performance will therefore be lower, but hopefully performance will be comparable across the 3 implementations. Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: 9969-0.94.txt, hbase-9969-pq-v1.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Corgan updated HBASE-9969: --- Attachment: KeyValueHeapBenchmark_v1.ods Attaching KeyValueHeapBenchmark_v1.ods with the benchmark output for both 1 col/row and 16 cols/row. Some of it's hard to explain. Looks like LoserTree is often faster at next() when there is more heaping to do, but not when KVs are coming from the same scanner, like when numScanners=1 or colsPerRow=16. Maybe because it doesn't have an optimization for that case? Anyway, just putting it up there for people to poke holes in or continue to test. Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, hbase-9969-pq-v1.patch, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi updated HBASE-9969: Attachment: kvheap-benchmark.txt kvheap-benchmark.png hbase-9969.patch I ran two benchmarks: A) KeyValueHeapBenchmark class included in the patch It simply constructs a KeyValueHeap from several CollectionBackedKeyValueScanner and sees how many next/reseek calls per second. ||scanners|| lt-next || lt-reseek || pq-next || pq-reseek || |1|17543859.6|3058104|18181818.2|1798561.2| |2|11299435|5102040.8|11173184.4|3053435.1| |3|8547008.5|4854368.9|7915567.3|2859866.5| |4|7936507.9|4866180|5891016.2|2507837| |5|6711409.4|4739336.5|4748338.1|2296738.6| lt- denotes LoserTree based KeyValueHeap. pq- denotes PriorityQueue based KeyValueHeap. A complete result (with up to 19 scanners) is attached. B) ColumnPaginationFilter with offset=1M I ran a mini-cluster and put a huge number of columns on a single row. Thes columns are uniformly written to several HFiles. Then query using ColumnPaginationFilter with offset = 1M. Blocks are cached, so it is CPU intensive. Qualifiers and values are 4 byte integers. Row key is test_row. Blocks are not compressed. The below table shows how long the scan takes. || hfiles || lt || pq || | 1 | 749.8 ms | 986.69 ms | | 2 |1511.28 ms | 2190.97 ms | | 3 |2392.8 ms | 4029.8 ms | | 4 | 3318.8 ms | 5760.22 ms Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Reporter: Chao Shi Attachments: hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi updated HBASE-9969: Status: Patch Available (was: Open) Submit to hadoop QA. Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Reporter: Chao Shi Attachments: hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi updated HBASE-9969: Attachment: hbase-9969.patch Repost the patch, as hadoop QA thought the benchmark result is the patch Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Reporter: Chao Shi Attachments: hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9969: Assignee: Chao Shi Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Reporter: Chao Shi Assignee: Chao Shi Attachments: hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-9969: Component/s: regionserver Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Chao Shi Assignee: Chao Shi Attachments: hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9969: - Component/s: Performance Fix Version/s: 0.96.0 0.96.1 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.96.0, 0.96.1 Attachments: hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9969: - Fix Version/s: (was: 0.96.0) 0.98.0 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Shi updated HBASE-9969: Attachment: hbase-9969-v2.patch v2 is attached, fixed Ted mentioned problems. https://reviews.apache.org/r/15563 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1 Attachments: hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9969: - Fix Version/s: 0.94.15 Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9969: - Attachment: 9969-0.94.txt 0.94 patch for playing. I ran my old tight loop scan test (with a real client and real server). Time to scan through 20rows (1 col) went from 16.8s to 11.6s (using WildcardColumntracker - no columns specified in scan). When using ExplicitColumnTracker it went from 26s to 21s. That's pretty impressive. On top of that the code is easier to understand now. Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)