[ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Ranganathan updated HBASE-6066: --------------------------------------- Issue Type: Sub-task (was: Improvement) Parent: HBASE-6922 > some low hanging read path improvement ideas > --------------------------------------------- > > Key: HBASE-6066 > URL: https://issues.apache.org/jira/browse/HBASE-6066 > Project: HBase > Issue Type: Sub-task > Components: Performance > Reporter: Kannan Muthukkaruppan > Assignee: Michal Gregorczyk > Priority: Critical > Labels: noob > Fix For: 0.96.0 > > Attachments: > 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, > metric-stringbuilder-fix.patch > > > I was running some single threaded scan performance tests for a table with > small sized rows that is fully cached. Some observations... > We seem to be doing several wasteful iterations over and/or building of > temporary lists. > 1) One such is the following code in HRegionServer.next(): > {code} > boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE); > if (!values.isEmpty()) { > for (KeyValue kv : values) { ------> #### wasteful in most > cases > currentScanResultSize += kv.heapSize(); > } > results.add(new Result(values)); > {code} > By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases, > we can avoid the unnecessary iteration to compute currentScanResultSize. > 2) An example of a wasteful temporary array, is "results" in > RegionScanner.next(). > {code} > results.clear(); > boolean returnResult = nextInternal(limit, metric); > outResults.addAll(results); > {code} > results then gets copied over to outResults via an addAll(). Not sure why we > can not directly collect the results in outResults. > 3) Another almost similar exmaple of a wasteful array is "results" in > StoreScanner.next(), which eventually also copies its results into > "outResults". > 4) Reduce overhead of "size metric" maintained in StoreScanner.next(). > {code} > if (metric != null) { > HRegion.incrNumericMetric(this.metricNamePrefix + metric, > copyKv.getLength()); > } > results.add(copyKv); > {code} > A single call to next() might fetch a lot of KVs. We can first add up the > size of those KVs in a local variable and then in a finally clause increment > the metric one shot, rather than updating AtomicLongs for each KV. > 5) RegionScanner.next() calls a helper RegionScanner.next() on the same > object. Both are synchronized methods. Synchronized methods calling nested > synchronized methods on the same object are probably adding some small > overhead. The inner next() calls isFilterDone() which is a also a > synchronized method. We should factor the code to avoid these nested > synchronized methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira