Tthe test described can only favor mysql (single column, just a million
rows). Do you need Hbase?
You might also tell us more about your hbase setup. Is it using localfs
or hdfs? Is it a distributed hdfs or all on single server?
Thanks,
St.Ack
Joost Ouwerkerk wrote:
I'm working on a web application with primarily read-oriented
performance requirements. I've been running some benchmarking tests
that include our application layer, to get a sense of what is possible
with Hbase. A variation on the Bigtable test that is reproduced by
org.apache.hadoop.hbase.PerformanceEvaluation, I'm randomly reading 1
column from a table with 1 million rows. In our case, the contents of
that column need to be deserialized by our application (which adds
some overhead that I'm also trying to measure), the deserialized
contents represent a little over 1K of data.
Although a single thread can only achieve 125 reads per second, with
12 client threads (from 3 different machines) I'm able to read as many
as 500 objects per second. Now, I've replicated my test on a basic
MySQL table and am able to get a throughput of 2,300 reads/sec;
roughly 5 times what I'm seeing with Hbase. Besides the obvious code
maturity thing, is the discrepancy related to random reads not
actually being served from memcache, but rather from the disk, by
Hbase? The HBase performance page
(http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation) shows
random reads(mem) as "Not implemented."
Can anyone shed some light on the state of HBase's memcaching?
Cheers,
Joost.