Genady Gillin wrote:
Hi,
We use HBase 0.19Rc2, our data(~800GB) resides in one table( is it bad?),
schema of table is pretty simple - it's two column families, one is keys and
second is value, each key could have one or more values(~100).
Keys in one column family and values in another? Why not both in the
one column family?
You use the keys in first column family to do lookups into the second?
To query
values used some file with keys(for instance about 10M keys), so the purpose
is to read all values for each one of keys, where expected performance is
about 1 hour. By the way data output is not too big ~2Gb.
Can you sort the keys and then start a scanner with perhaps start and
stop keys being first and last from file? Does that run faster?
But sounds like you need to run an MR job. You tried that and it
failed. You tried on same hardware? My guess is your were running into
the issue we're discussing in other email ('.... slept too long...').
St.Ack
Thanks,
Gennady
On Thu, Jan 22, 2009 at 7:46 PM, stack <[email protected]> wrote:
Genady wrote:
Hi,
Just wondering if somebody could recommend a random read strategy for
searching a big group of the keys(100M) in hadoop/hbase cluster, using one
client is very slow, separating an input to smaller groups and running
each
one with a different client is certainly improves performance, but maximum
speed I'm getting is ~3300 read/sec. I've tried to use map reduce and to
run
search as map-reduce ask and to run HBase reads from map or reduce, but
HBase is start to fail. So hardware upgrade and creating HBase in memory
tables is only direction here?
Tell us more about your table schema, data sizes, and the types of query.
What performance do you need from hbase? Do your rows have many columns
and you are trying to get all columns when you query for example? Are you
on 0.19.0 Genady (sorry if you've answered this question in the near past)?
St.Ack