Remember that random reads are the worst case of all db performance numbers. You are limited by the spindle count and the seek time (typically 7-9ms). Truly random reads on a data set much much larger than ram will never be fast, no matter what the db system.
But fortunately the real world is more forgiving - there is rarely random reads, disk buffer caches help, and you can frequently exploit data locality. On Oct 6, 2009 11:13 AM, "Andrew Purtell" <apurt...@apache.org> wrote: Hi Adam, thanks for writing in. I suggest using Thrift or the native Java API instead of REST for benchmarking performance. If you must use REST, for bulk throughput benching, consider using Stargate (contrib/stargate/) and bulk transactions -- scanners with 'batch' parameter set >= 100, or multi-puts with 100s or 1000s of mutations. We had a fellow up on the list some time ago who did some localhost only benchmarking of the three API options. Java API got 22K ops/sec, Thrift connector got 20K ops/sec, REST connector got 8K ops/sec. The transactions were not batched in nature. Absolute numbers are not important. Note the scale of the differences. > Note that although we want to see where throughput maxes out, the workload is random, rather than... That's currently an impedance mismatch. HBase throughput with 0.20.0 is best with scanners. MultiGet/Put/Delete is on deck but not ready yet: https://issues.apache.org/jira/browse/HBASE-1845 - Andy ________________________________ From: Adam Silberstein <silbe...@yahoo-inc.com> To: hbase-user@hadoop.apache.org Sent: Tuesday, October 6, 2009 8:59:30 AM Subject: random read/write performance Hi, Just wanted to give a quick update on our HBase benchmarking efforts at Yahoo. The basic use ...