Hi Adam, thanks for writing in.

I suggest using Thrift or the native Java API instead of REST for benchmarking 
performance. If you must use REST, for bulk throughput benching, consider using 
Stargate (contrib/stargate/) and bulk transactions -- scanners with 'batch' 
parameter set >= 100, or multi-puts with 100s or 1000s of mutations. We had a 
fellow up on the list some time ago who did some localhost only benchmarking of 
the three API options. Java API got 22K ops/sec, Thrift connector got 20K 
ops/sec, REST connector got 8K ops/sec. The transactions were not batched in 
nature. Absolute numbers are not important. Note the scale of the differences. 

> Note that although we want to see where throughput maxes out, the workload is 
> random, rather than scan-oriented.

That's currently an impedance mismatch. HBase throughput with 0.20.0 is best 
with scanners. MultiGet/Put/Delete is on deck but not ready yet: 
https://issues.apache.org/jira/browse/HBASE-1845

   - Andy




________________________________
From: Adam Silberstein <silbe...@yahoo-inc.com>
To: hbase-user@hadoop.apache.org
Sent: Tuesday, October 6, 2009 8:59:30 AM
Subject: random read/write performance

Hi,

Just wanted to give a quick update on our HBase benchmarking efforts at
Yahoo.  The basic use case we're looking at is:

1K records

20GB of records per node (and 6GB of memory per node, so data is not
memory resident)

Workloads that do random reads/writes (e.g. 95% reads, 5% writes).

Multiple clients doing the reads/writes (i.e. 50-200)

Measure throughput vs. latency, and see how high we can push the
throughput.  

Note that although we want to see where throughput maxes out, the
workload is random, rather than scan-oriented.



I've been tweaking our HBase installation based on advice I've
read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
size set to 6GB per server, and have iCMS off.  I'm still using the REST
server instead of the java client.  We're about to move our benchmarking
tool to java, so at that point we can use the java API.  At that point,
I want to turn off WAL as well.  If anyone has more suggestions for this
workload (either things to try while still using REST, or things to try
once I have a java client), please let me know. 



Given all that, I'm currently seeing maximal throughput of about 300
ops/sec/server.  Has anyone with a similar disk-resident and random
workload seen drastically different numbers, or guesses for what I can
expect with the java client?



Thanks!

Adam


      

Reply via email to