[
https://issues.apache.org/jira/browse/HBASE-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Kellerman closed HBASE-684.
-------------------------------
> unnecessary iteration in HMemcache.internalGet? got much better reading
> performance after break it.
> ---------------------------------------------------------------------------------------------------
>
> Key: HBASE-684
> URL: https://issues.apache.org/jira/browse/HBASE-684
> Project: Hadoop HBase
> Issue Type: Improvement
> Affects Versions: 0.1.2
> Reporter: Luo Ning
> Fix For: 0.1.3
>
> Attachments: 684.patch
>
>
> hi stack:
> first thanks much to your authors, it's a great system.
> not sure, but i think the tail map iteration should break after
> 'itKey.matchesRowCol(key)' return false. in HStore.HMemcache.internalGet.
> because the tail map is SortedMap too, and keys matches the input 'key'
> should in the beginnging of the map.
> i created a patched version of the class for testing , found about 5x read
> performance improving in my testcase.
> comments here:
> 1. i reach to reviewing HStore.java, because bothered by terrible reading
> performance using 0.1.2 release: ONE record per second. testing env: 4Gmem,
> 2*duo xeon 2G, 100k record in test table, 100k bytes per record column, 1
> column only.
> 2. i have seen PerformanceEvaluation pages in wiki, 1k bytes record reading
> performence also acceptable in my testing env, but as the record size
> increasing, reading performance go down so quickly.
> 3. when profiling hregionserver process, i found the first bottleneck is data
> io in MapFile, this is the hbase.io.index.interval issue(HBASE-680) i posted
> yesterday.
> 4. after set hbase.io.index.interval to 1, reading performance improved much,
> but not enough(i thinks it should be Nx hadoop reading performance, where
> N<10), this time profiling show HMemcache.internalGet used much cpu time, and
> each row get will calling about 200 times HStoreKey#matchesRowCol, in my test
> env.
> 5. applying my patched version, i got mucher better reading performance.
> test case desc: first inserting 100k records to a table, then random read
> 10000 from it.
> 6. this change tak no effect if no cache there, like regionserver refresh
> started, so my test case insert rows first, but this is a normal situation
> that reading and writing in same time.
> here is my simple patch:
> Index: src/java/org/apache/hadoop/hbase/HStore.java
> ===================================================================
> --- src/java/org/apache/hadoop/hbase/HStore.java Fri Jun 13 00:15:59 CST
> 2008
> +++ src/java/org/apache/hadoop/hbase/HStore.java Fri Jun 13 00:15:59 CST
> 2008
> @@ -478,11 +478,14 @@
> if (!HLogEdit.isDeleted(es.getValue())) {
> result.add(tailMap.get(itKey));
> }
> - }
> - if (numVersions > 0 && result.size() >= numVersions) {
> - break;
> - }
> + if (numVersions > 0 && result.size() >= numVersions) {
> + break;
> + }
> + }else
> + { //by L.N., map is sorted, so we can't find match any more.
> + break;
> - }
> + }
> + }
> return result;
> }
> after all, i'd suggest a new hbase class for memory cache holder instead of
> synchronized sorted map, this can lead to much better performance, basicly
> avoid iteration(if my thoughts above is wrong), and remove many sync/lock
> unnecessary.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.