Hello Hbase users,

I am not sure if this is a hbase or hadoop question, but to answer you probably need to understand both so trying this list ;)

I have a system with a table that stores  e.g. user information example:

key(username1) info:firstname:John, info:lastname:Doe, info:datejoined:2010-01-01

and then I have a table that store indexes for the first table: those indexes have a key e.g. based on the date:

joinedasc_20100101 user:id:username1

joinedesc_20100101 user:id:username1

In that table I store the keys in ascending and descending joined date.

The typical access for the two tables are different
a) the second table is typically accessed using a scanner
b) the first table is randomly accessed
Each time I do a scanner I get the other fields (other than username) from the first table.

I am trying to have information to increase the performance of the two access modes.

I would expect that mode a) performance does not really depend on the number of replicas in HDFS but that mode b) speed depends on the number of replicas in HDFS. It has been said previously that random read accesses are limited by the performance of the disks. Can I artificially boost standard disks by adding more replicas to improve random reads?


Thanks
TuX

PS: I read http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation but they do not mention the dfs.replication as a parameter affecting random reads The page http://wiki.apache.org/hadoop/PerformanceTuning mentions a different parameter block.replication.factor which is more mapreduce oriented.


Reply via email to