Remember that random reads are the worst case of all db performance numbers.
You are limited by the spindle count and the seek time (typically 7-9ms).
Truly random reads on a data set much much larger than ram will never be
fast, no matter what the db system.

But fortunately the real world is more forgiving - there is rarely random
reads, disk buffer caches help, and you can frequently exploit data
locality.

On Oct 6, 2009 11:13 AM, "Andrew Purtell" <apurt...@apache.org> wrote:

Hi Adam, thanks for writing in.

I suggest using Thrift or the native Java API instead of REST for
benchmarking performance. If you must use REST, for bulk throughput
benching, consider using Stargate (contrib/stargate/) and bulk transactions
-- scanners with 'batch' parameter set >= 100, or multi-puts with 100s or
1000s of mutations. We had a fellow up on the list some time ago who did
some localhost only benchmarking of the three API options. Java API got 22K
ops/sec, Thrift connector got 20K ops/sec, REST connector got 8K ops/sec.
The transactions were not batched in nature. Absolute numbers are not
important. Note the scale of the differences.

> Note that although we want to see where throughput maxes out, the workload
is random, rather than...
That's currently an impedance mismatch. HBase throughput with 0.20.0 is best
with scanners. MultiGet/Put/Delete is on deck but not ready yet:
https://issues.apache.org/jira/browse/HBASE-1845

  - Andy




________________________________
From: Adam Silberstein <silbe...@yahoo-inc.com>

To: hbase-user@hadoop.apache.org
Sent: Tuesday, October 6, 2009 8:59:30 AM
Subject: random read/write performance

Hi, Just wanted to give a quick update on our HBase benchmarking efforts at
Yahoo. The basic use ...

Reply via email to