scanner access

TuX RaceR Tue, 05 Jan 2010 02:09:37 -0800

Thanks a lot St.Ack for the time you spend to answer user questions andfor developing this nice piece of software (hbase)


stack wrote:

The amount of replication should have no effect on either access mode.
 Whether scanning or random-accessing, only one of the N replicas is
accessed.  We'll only go to the other versions if there is trouble accessing
the first.
So, more replicas will not change the performance profile.

I am not sure if hbase or hadoop is responsible for choosing thelocation of the replica. Having more replica may not avoid the diskaccess random read limitations but it should probably avoid network latency?If I have and web application with N clients accessing hbase, if one ofthose clients has to get the value for a key it should be faster toaccess it if the value for that key is stored on that node? (as we avoida network call). But you are right it does not seem I can get around thedisk random read performance limitations.

What do you need to improve?  Are both scans and random-reads slow for you?
  You've seen the performance page up on the wiki (I'm sure you have).

Unfortunately I am not in a position to really benchmark my applicationas I currently can't run it on a true cluster (using a cluster ofvirtual machines would lead to obviously wrong results ;). At this stageI am just trying to understand how hbase/hadoop works to avoid bigmistakes in the design of the architecture. My application currentlyruns in production on a postgresql database: I replicate it over severalnodes and read access performs better when I have more replicas becauseeach node connects to a local database.


Thanks
TuX

Re: hadoop dfs.replication parameter and hbase/performance for random/scanner access

Reply via email to