I'm working on a web application with primarily read-oriented performance requirements. I've been running some benchmarking tests that include our application layer, to get a sense of what is possible with Hbase. A variation on the Bigtable test that is reproduced by org.apache.hadoop.hbase.PerformanceEvaluation, I'm randomly reading 1 column from a table with 1 million rows. In our case, the contents of that column need to be deserialized by our application (which adds some overhead that I'm also trying to measure), the deserialized contents represent a little over 1K of data. Although a single thread can only achieve 125 reads per second, with 12 client threads (from 3 different machines) I'm able to read as many as 500 objects per second. Now, I've replicated my test on a basic MySQL table and am able to get a throughput of 2,300 reads/sec; roughly 5 times what I'm seeing with Hbase. Besides the obvious code maturity thing, is the discrepancy related to random reads not actually being served from memcache, but rather from the disk, by Hbase? The HBase performance page (http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation) shows random reads(mem) as "Not implemented."

Can anyone shed some light on the state of HBase's memcaching?
Cheers,
Joost.

Reply via email to