Re: HBase Random Read Performance

stack Thu, 07 Feb 2008 20:27:32 -0800

Interesting test Joost. Clients I presume are doing the same randomnessagainst hbase and mysql? I wonder how the two would compare if you had100 million rows? Whatever, looks like we need to do some mysql studying.

We do not have any near plans for putting a table into memory. Ourcurrent focus is reliability and scaling. Just today, an optional LRUcache of the data read from hdfs -- HBASE-344 -- was added and becauseof it, caching of hot cells shouldn't be too far behind (HBASE-80). Butthis additions probably won't do anything for the hbase numbers if yourreads are truely random.


Thanks,
St.Ack


Joost Ouwerkerk wrote:

I'm working on a web application with primarily read-orientedperformance requirements. I've been running some benchmarking teststhat include our application layer, to get a sense of what is possiblewith Hbase. A variation on the Bigtable test that is reproduced byorg.apache.hadoop.hbase.PerformanceEvaluation, I'm randomly reading 1column from a table with 1 million rows. In our case, the contents ofthat column need to be deserialized by our application (which addssome overhead that I'm also trying to measure), the deserializedcontents represent a little over 1K of data.Although a single thread can only achieve 125 reads per second, with12 client threads (from 3 different machines) I'm able to read as manyas 500 objects per second. Now, I've replicated my test on a basicMySQL table and am able to get a throughput of 2,300 reads/sec;roughly 5 times what I'm seeing with Hbase. Besides the obvious codematurity thing, is the discrepancy related to random reads notactually being served from memcache, but rather from the disk, byHbase? The HBase performance page(http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation) showsrandom reads(mem) as "Not implemented."
Can anyone shed some light on the state of HBase's memcaching?
Cheers,
Joost.

Re: HBase Random Read Performance

Reply via email to