On Tue, Feb 17, 2009 at 11:29 AM, shourabh rawat <[email protected]>wrote:
> Thanx for replying.... > Well the problem is this. > I have a distributed setup of hbase over hadoop(a cluster of 3). > I have loaded around 4 millions entries into my hbase. > Now i want to read on it.(read a set of entries) > Reading sequentially adds on the performance. What do you mean by the above when you say read sequentially? Are you scanning? (Getting a scanner and then nexting through your hbase table?). > > I want really good performance (i mean retrieval should be well within > 10 ms per entry on an average) You will have to wait for hbase 0.20.0 or do as Erik suggests and put a cache in front of hbase. What are you trying to do with hbase? Serve a website? > > So i thought of trying out the bulk read (but no such function on the hbase > api) so i resorted to threads...created one htable instance per thread and > did a get on the same table in parallel. > But still the performance doesn't seem to get effected. > Are u sure that the hbase treats them parallely or does it handle them > sequentially even when thr are parallel request. > Nyways wat is a good performance on a hbase...Any other way to improve > on this performance... > Can multiple instances of hbase be created (and not HTable as All the > HTables are seem to be using the same connection > i mean HConnection object). Yeah, the RPC keeps a single connection per remote server but channel is shared by request and receive. Testing in past, the more remote servers, the better, but even if a few only, concurrent HTables got better throughput than one running requests in series (the single connection is not fully occupied by requests and responses). St.Ack > > > Would be great if you could help me on this...and clear my concepts >
