Ok St.Ack,I've re-profiled with your recent updates. Overall the benchmark seems 4.1% faster averaged over several runs. I can't say i noticed a significant improvement in the readAndProcess()/ObjectWritable.write() methods. I'll definately add a vote to Hadoop-414.
Definately i notice that UTF8.write() takes up 78% of the overall write time and is 50% the amount of time that the 100061 calls to HReginServer.next() take.
UTF8.readString() takes up only 15% with sun.nio.ch.SocketChannelImpl.read(ByteBuffer) (41%) being the biggest piece of the pie.
Michael Stack wrote:
I recently spent some time profiling hbase writing. I haven't yet spent time on reading. The patch in HADOOP-1498 has fixes that came of the write profiling sessions. After applying the patch, running the http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation write tests, we seem to be spending > 90% of allocations and CPU time in RPC reading and writing instances of the deprecated UTF8 class (I believe HADOOP-414 is the issue that covers removing UTF8 from the RPC. If you are seeing the same phenomenon, you might add a vote so the issue gets pegged at a higher priority). Does the HADOOP-1498 patch help with the performance profile you are seeing in hbase?More below... James Kennedy wrote:... Obervations:Within the combined RPC$server.call() of all Handler threads, HRegionServer.next() took about 90% in both the 10K and 100K cases. This is as expected since it's called 10K/100K times. The biggest bottleneck bellow that is HStoreKey.extractFamily() which is 33%/39% of HRegionServer.next() which is more expensive than all the IO operations in HStoreScanner.getNext() (31%/31%). Couldn't there be a cheaper way to extract family from a col key?The HADOOP_1498 patch redoes extractFamily so it uses Text#find and then a byte array copy to produce the column family. It used to do a toString followed by an indexOf, substring, and then a new Text out of the produced substring.Outside the RPC$server.call(),Server$Connection.readAndProcess() takes about 17%/22% of the time that HRegionServer.next() takes suggesting a heavy overhead in RPC IO and unmarshalling. Overhead in the other direction, ObjectWritable.write() + DataOutputStream.flush() take 63%/57% of the time that HRegionServer.next() takes. The ratio of input/output makes sense since more data is returned than sent. But this seems like a lot of RPC overhead.Yes. See my comment above on what I'm seeing writing. ...If you have suggestions for how to improve on the version in HADOOP-1498, would love to hear them.Right now, if i assume that YourKit is counting block time and I ignore those numbers, I come to the following conclusions:It looks like HStoreKey.extractFamily() is a bottleneck. For 100k rows and 10 cols, it will be called over 1,000,000 times during a full table scan (even if just retrieving one col) and each call requires multiple byte[] <-> string conversions.RPC overhead can be quite high during a scan. Could batched/read-ahead scanning be useful to minimize RPC calls?Moving from UTF8 to Text would help with time spent in RPC some going by some primitive tests I ran locally.Batching, as per your suggestion in HADOOP-1468, would also help. St.Ack------------------------------------------------------------------------ <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.master</name> <value>172.30.20.15:60000</value> <description>The host and port that the HBase master runs at. TODO: Support 'local' (All running in single context). </description> </property> <property> <name>hbase.regionserver</name> <value>172.30.20.15:60010</value> <description>The host and port a HBase region server runs at. </description> </property> <property> <name>hbase.regiondir</name> <value>hbase</value> <description>The directory shared by region servers. </description> </property> <property> <name>hbase.hregion.max.filesize</name> <value>134217728</value> <description>Maximum desired file size for an HRegion. If filesize exceeds value + (value / 2), the HRegion is split in two. Default: 128M (134217728).</description> </property> <property> <name>hbase.hregion.compactionThreshold</name> <value>10</value> <description>By default, we compact the region if an HStore has more than 10 map files.</description> </property> <property><name>hbase.regionserver.thread.splitcompactcheckfrequency</name><value>40000</value> <description> Default: 60 * 1000 </description> </property> <property> <name>hbase.client.pause</name> <value>2000</value> </property> <property> <name>hbase.client.timeout.number</name> <value>30000</value> <description>Try this many timeouts before giving up. </description> </property> <property> <name>hbase.client.retries.number</name> <value>15</value><description>Count of maximum retries fetching the root region from rootregion server. </description> </property> <property> <name>hbase.master.meta.thread.rescanfrequency</name> <value>60000</value><description>How long the HMaster sleeps (in milliseconds) between scans ofthe root and meta tables. </description> </property> <property> <name>hbase.master.lease.period</name> <value>30000</value><description>HMaster server lease period in milliseconds. Default is30 seconds.</description> </property> <property> <name>hbase.server.thread.wakefrequency</name> <value>5000</value><description>Time to sleep in between searches for work (in milliseconds). Used as sleep interval by service threads such as META scanner and log roller.</description> </property> <property> <name>hbase.regionserver.lease.period</name> <value>30000</value><description>HRegion server lease period in milliseconds. Default is30 seconds.</description> </property> <property> <name>hbase.regionserver.handler.count</name> <value>10</value><description>Count of RPC Server instances spun up on RegionServers Same property is used by the HMaster for count of master handlers.Default is 10. </description> </property> <property> <name>hbase.regionserver.msginterval</name> <value>500</value><description>Interval between messages from the RegionServer to HMaster in milliseconds. Default is 15 sec. Set this value low if you want unittests to be responsive. </description> </property> </configuration>
