[ 
https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Ranganathan updated HBASE-6066:
---------------------------------------

    Issue Type: Sub-task  (was: Improvement)
        Parent: HBASE-6922
    
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Michal Gregorczyk
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 
> 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, 
> metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with 
> small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of 
> temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most 
> cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we 
> can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in 
> StoreScanner.next(), which eventually also copies its results into 
> "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the 
> size of those KVs in a local variable and then in a finally clause increment 
> the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same 
> object. Both are synchronized methods. Synchronized methods calling nested 
> synchronized methods on the same object are probably adding some small 
> overhead. The inner next() calls isFilterDone() which is a also a 
> synchronized method. We should factor the code to avoid these nested 
> synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to