scanner access

TuX RaceR Mon, 04 Jan 2010 09:41:07 -0800

Hello Hbase users,

I am not sure if this is a hbase or hadoop question, but to answer youprobably need to understand both so trying this list ;)


I have a system with a table that stores  e.g. user information example:

key(username1) info:firstname:John, info:lastname:Doe,info:datejoined:2010-01-01

and then I have a table that store indexes for the first table: thoseindexes have a key e.g. based on the date:


joinedasc_20100101 user:id:username1

joinedesc_20100101 user:id:username1

In that table I store the keys in ascending and descending joined date.

The typical access for the two tables are different
a) the second table is typically accessed using a scanner
b) the first table is randomly accessed

Each time I do a scanner I get the other fields (other than username)from the first table.

I am trying to have information to increase the performance of the twoaccess modes.

I would expect that mode a) performance does not really depend on thenumber of replicas in HDFSbut that mode b) speed depends on the number of replicas in HDFS. It hasbeen said previously that random read accesses are limited by theperformance of the disks.Can I artificially boost standard disks by adding more replicas toimprove random reads?



Thanks
TuX

PS: I read http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation butthey do not mention the dfs.replication as a parameter affecting randomreadsThe page http://wiki.apache.org/hadoop/PerformanceTuning mentions adifferent parameter block.replication.factor which is more mapreduceoriented.

hadoop dfs.replication parameter and hbase/performance for random/scanner access

Reply via email to