[ 
https://issues.apache.org/jira/browse/HBASE-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-684:
------------------------

    Fix Version/s: 0.1.4

Committed to trunk.  Leaving open for now.  Should commit to branch.  Assigning 
0.1.4 for moment.

> unnecessary iteration in HMemcache.internalGet? got much better reading 
> performance after break it.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-684
>                 URL: https://issues.apache.org/jira/browse/HBASE-684
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.1.2
>            Reporter: LN
>             Fix For: 0.1.4
>
>         Attachments: 684.patch
>
>
> hi stack:
> first thanks much to your authors, it's a great system.
> not sure, but i think the tail map iteration should break after 
> 'itKey.matchesRowCol(key)' return false. in HStore.HMemcache.internalGet. 
> because the tail map is SortedMap too, and keys matches the input 'key' 
> should in the beginnging of the map.
> i created a patched version of the class for testing , found about 5x read 
> performance improving in my testcase.  
> comments here:
> 1. i reach to reviewing HStore.java, because bothered by terrible reading 
> performance using 0.1.2 release: ONE record per second. testing env: 4Gmem, 
> 2*duo xeon 2G, 100k record in test table, 100k bytes per record column, 1 
> column only.
> 2. i have seen PerformanceEvaluation pages in wiki, 1k bytes record reading 
> performence also acceptable in my testing env, but as the record size 
> increasing, reading performance go down so quickly.
> 3. when profiling hregionserver process, i found the first bottleneck is data 
> io in MapFile, this is the hbase.io.index.interval issue(HBASE-680) i posted 
> yesterday.
> 4. after set hbase.io.index.interval to 1, reading performance improved much, 
> but not enough(i thinks it should be Nx hadoop reading performance, where 
> N<10), this time profiling show HMemcache.internalGet used much cpu time, and 
> each row get will calling about 200 times HStoreKey#matchesRowCol, in my test 
> env.
> 5. applying my patched version, i got mucher better reading performance.  
> test case desc: first inserting  100k records to a table, then random read  
> 10000 from it.
> 6. this change tak no effect if no cache there, like regionserver refresh 
> started, so my test case insert rows first, but this is a normal situation 
> that reading and writing in same time.
> here is my simple patch:
> Index: src/java/org/apache/hadoop/hbase/HStore.java
> ===================================================================
> --- src/java/org/apache/hadoop/hbase/HStore.java      Fri Jun 13 00:15:59 CST 
> 2008
> +++ src/java/org/apache/hadoop/hbase/HStore.java      Fri Jun 13 00:15:59 CST 
> 2008
> @@ -478,11 +478,14 @@
>            if (!HLogEdit.isDeleted(es.getValue())) { 
>              result.add(tailMap.get(itKey));
>            }
> -        }
> -        if (numVersions > 0 && result.size() >= numVersions) {
> -          break;
> -        }
> +            if (numVersions > 0 && result.size() >= numVersions) {
> +              break;
> +            }
> +        }else
> +          { //by L.N., map is sorted, so we can't find match any more.
> +            break;
> -      }
> +          }
> +      }
>        return result;
>      }
> after all, i'd suggest a new hbase class for memory cache holder instead of 
> synchronized sorted map, this can lead to much better performance, basicly 
> avoid iteration(if my thoughts above is wrong), and remove many sync/lock 
> unnecessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to