[jira] Created: (HBASE-684) unnecessary iteration in HMemcache.internalGet? got much better reading performance after break it.

LN (JIRA) Thu, 12 Jun 2008 23:45:37 -0700

unnecessary iteration in HMemcache.internalGet? got much better reading 
performance after break it.
---------------------------------------------------------------------------------------------------


                 Key: HBASE-684
                 URL: https://issues.apache.org/jira/browse/HBASE-684
             Project: Hadoop HBase
          Issue Type: Improvement
    Affects Versions: 0.1.2
            Reporter: LN


hi stack:
first thanks much to your authors, it's a great system.

not sure, but i think the tail map iteration should break after 
'itKey.matchesRowCol(key)' return false. in HStore.HMemcache.internalGet. 
because the tail map is SortedMap too, and keys matches the input 'key' should 
in the beginnging of the map.

i created a patched version of the class for testing , found about 5x read 
performance improving in my testcase.  
comments here:
1. i reach to reviewing HStore.java, because bothered by terrible reading 
performance using 0.1.2 release: ONE record per second. testing env: 4Gmem, 
2*duo xeon 2G, 100k record in test table, 100k bytes per record column, 1 
column only.
2. i have seen PerformanceEvaluation pages in wiki, 1k bytes record reading 
performence also acceptable in my testing env, but as the record size 
increasing, reading performance go down so quickly.
3. when profiling hregionserver process, i found the first bottleneck is data 
io in MapFile, this is the hbase.io.index.interval issue(HBASE-680) i posted 
yesterday.
4. after set hbase.io.index.interval to 1, reading performance improved much, 
but not enough(i thinks it should be Nx hadoop reading performance, where 
N<10), this time profiling show HMemcache.internalGet used much cpu time, and 
each row get will calling about 200 times HStoreKey#matchesRowCol, in my test 
env.
5. applying my patched version, i got mucher better reading performance.  test 
case desc: first inserting  100k records to a table, then random read  10000 
from it.
6. this change tak no effect if no cache there, like regionserver refresh 
started, so my test case insert rows first, but this is a normal situation that 
reading and writing in same time.

here is my simple patch:
Index: src/java/org/apache/hadoop/hbase/HStore.java
===================================================================
--- src/java/org/apache/hadoop/hbase/HStore.java        Fri Jun 13 00:15:59 CST 
2008
+++ src/java/org/apache/hadoop/hbase/HStore.java        Fri Jun 13 00:15:59 CST 
2008
@@ -478,11 +478,14 @@
           if (!HLogEdit.isDeleted(es.getValue())) { 
             result.add(tailMap.get(itKey));
           }
-        }
-        if (numVersions > 0 && result.size() >= numVersions) {
-          break;
-        }
+            if (numVersions > 0 && result.size() >= numVersions) {
+              break;
+            }
+        }else
+          { //by L.N., map is sorted, so we can't find match any more.
+            break;
-      }
+          }
+      }
       return result;
     }

after all, i'd suggest a new hbase class for memory cache holder instead of 
synchronized sorted map, this can lead to much better performance, basicly 
avoid iteration(if my thoughts above is wrong), and remove many sync/lock 
unnecessary. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HBASE-684) unnecessary iteration in HMemcache.internalGet? got much better reading performance after break it.

Reply via email to