Hi, We use HBase 0.19Rc2, our data(~800GB) resides in one table( is it bad?), schema of table is pretty simple - it's two column families, one is keys and second is value, each key could have one or more values(~100). To query values used some file with keys(for instance about 10M keys), so the purpose is to read all values for each one of keys, where expected performance is about 1 hour. By the way data output is not too big ~2Gb.
Thanks, Gennady On Thu, Jan 22, 2009 at 7:46 PM, stack <[email protected]> wrote: > Genady wrote: > >> Hi, >> >> >> Just wondering if somebody could recommend a random read strategy for >> searching a big group of the keys(100M) in hadoop/hbase cluster, using one >> client is very slow, separating an input to smaller groups and running >> each >> one with a different client is certainly improves performance, but maximum >> speed I'm getting is ~3300 read/sec. I've tried to use map reduce and to >> run >> search as map-reduce ask and to run HBase reads from map or reduce, but >> HBase is start to fail. So hardware upgrade and creating HBase in memory >> tables is only direction here? >> >> >> > Tell us more about your table schema, data sizes, and the types of query. > What performance do you need from hbase? Do your rows have many columns > and you are trying to get all columns when you query for example? Are you > on 0.19.0 Genady (sorry if you've answered this question in the near past)? > St.Ack >
