Interesting test Joost. Clients I presume are doing the same randomness
against hbase and mysql? I wonder how the two would compare if you had
100 million rows? Whatever, looks like we need to do some mysql studying.
We do not have any near plans for putting a table into memory. Our
current focus is reliability and scaling. Just today, an optional LRU
cache of the data read from hdfs -- HBASE-344 -- was added and because
of it, caching of hot cells shouldn't be too far behind (HBASE-80). But
this additions probably won't do anything for the hbase numbers if your
reads are truely random.
Thanks,
St.Ack
Joost Ouwerkerk wrote:
I'm working on a web application with primarily read-oriented
performance requirements. I've been running some benchmarking tests
that include our application layer, to get a sense of what is possible
with Hbase. A variation on the Bigtable test that is reproduced by
org.apache.hadoop.hbase.PerformanceEvaluation, I'm randomly reading 1
column from a table with 1 million rows. In our case, the contents of
that column need to be deserialized by our application (which adds
some overhead that I'm also trying to measure), the deserialized
contents represent a little over 1K of data.
Although a single thread can only achieve 125 reads per second, with
12 client threads (from 3 different machines) I'm able to read as many
as 500 objects per second. Now, I've replicated my test on a basic
MySQL table and am able to get a throughput of 2,300 reads/sec;
roughly 5 times what I'm seeing with Hbase. Besides the obvious code
maturity thing, is the discrepancy related to random reads not
actually being served from memcache, but rather from the disk, by
Hbase? The HBase performance page
(http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation) shows
random reads(mem) as "Not implemented."
Can anyone shed some light on the state of HBase's memcaching?
Cheers,
Joost.