HBase Random Read Performance

Joost Ouwerkerk Thu, 07 Feb 2008 19:28:42 -0800

I'm working on a web application with primarily read-orientedperformance requirements. I've been running some benchmarking teststhat include our application layer, to get a sense of what is possiblewith Hbase. A variation on the Bigtable test that is reproduced byorg.apache.hadoop.hbase.PerformanceEvaluation, I'm randomly reading 1column from a table with 1 million rows. In our case, the contents ofthat column need to be deserialized by our application (which adds someoverhead that I'm also trying to measure), the deserialized contentsrepresent a little over 1K of data.Although a single thread can only achieve 125 reads per second, with 12client threads (from 3 different machines) I'm able to read as many as500 objects per second. Now, I've replicated my test on a basic MySQLtable and am able to get a throughput of 2,300 reads/sec; roughly 5times what I'm seeing with Hbase. Besides the obvious code maturitything, is the discrepancy related to random reads not actually beingserved from memcache, but rather from the disk, by Hbase? The HBaseperformance page(http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation) shows randomreads(mem) as "Not implemented."


Can anyone shed some light on the state of HBase's memcaching?
Cheers,
Joost.

HBase Random Read Performance

Reply via email to