[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555951#comment-13555951 ]
liang xie commented on HBASE-7495: ---------------------------------- I just did a apple-to-apple comparison this morning, it shows the parallel seek reduces latency in special scenario. Attached is a prelim patch just for refer. My test env : 10 dn/rs each with 12*2T SATA, "hfile.block.cache.size=0", hbase0.94.3, cdh4.1.1 My test data : recordcount=1000000000 fieldcount=3 fieldlength=200 hbase(main):002:0> describe 'YCSBTest' DESCRIPTION ENABLED {NAME => 'YCSBTest', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy', FAMILIES => [{NAME => 'te true st', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '1', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER SIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'tru e', BLOCKCACHE => 'true'}]} $./hdfs dfs -du -s -h hdfs://lgxl-xieliang/ 726.8g hdfs://lgxl-xieliang/ 100 regions in total, and most of numberOfStorefiles in those regions are [0,5] My test cmd: bin/ycsb run hbase -P ./workloads/kaka -threads 1 -p columnfamily=test -p table=YCSBTest -s > log/run.log 2>&1 & I restarted the whole hbase/hdfs cluster and clear OS cache(echo 1 > /proc/sys/vm/drop_caches) before each run. Serial seek result: [OVERALL], RunTime(ms), 300027.0 [OVERALL], Throughput(ops/sec), 20.09819116279535 [READ], Operations, 6030 [READ], AverageLatency(us), 49739.97446102819 [READ], MinLatency(us), 2768 [READ], MaxLatency(us), 782892 [READ], 50thPercentileLatency(ms), 45 [READ], 95thPercentileLatency(ms), 90 [READ], 99thPercentileLatency(ms), 124 [READ], Return=0, 6030 Parallel seek result: [OVERALL], RunTime(ms), 300016.0 [OVERALL], Throughput(ops/sec), 39.584555490373845 [READ], Operations, 11876 [READ], AverageLatency(us), 25249.878410239136 [READ], MinLatency(us), 3084 [READ], MaxLatency(us), 753547 [READ], 50thPercentileLatency(ms), 22 [READ], 95thPercentileLatency(ms), 43 [READ], 99thPercentileLatency(ms), 67 [READ], Return=0, 11876 > parallel scanner seek in StoreScanner's constructor > --------------------------------------------------- > > Key: HBASE-7495 > URL: https://issues.apache.org/jira/browse/HBASE-7495 > Project: HBase > Issue Type: Bug > Components: Scanners > Affects Versions: 0.94.3, 0.96.0 > Reporter: liang xie > Assignee: liang xie > Attachments: HBASE-7495.txt > > > seems there's a potential improvable space before doing scanner.next: > {code:title=StoreScanner.java|borderStyle=solid} > if (explicitColumnQuery && lazySeekEnabledGlobally) { > for (KeyValueScanner scanner : scanners) { > scanner.requestSeek(matcher.getStartKey(), false, true); > } > } else { > for (KeyValueScanner scanner : scanners) { > scanner.seek(matcher.getStartKey()); > } > } > {code} > we can do scanner.requestSeek or scanner.seek in parallel, instead of current > serialization, to reduce latency for special case. > Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira