Hello Hbase users,
I am not sure if this is a hbase or hadoop question, but to answer you
probably need to understand both so trying this list ;)
I have a system with a table that stores e.g. user information example:
key(username1) info:firstname:John, info:lastname:Doe,
info:datejoined:2010-01-01
and then I have a table that store indexes for the first table: those
indexes have a key e.g. based on the date:
joinedasc_20100101 user:id:username1
joinedesc_20100101 user:id:username1
In that table I store the keys in ascending and descending joined date.
The typical access for the two tables are different
a) the second table is typically accessed using a scanner
b) the first table is randomly accessed
Each time I do a scanner I get the other fields (other than username)
from the first table.
I am trying to have information to increase the performance of the two
access modes.
I would expect that mode a) performance does not really depend on the
number of replicas in HDFS
but that mode b) speed depends on the number of replicas in HDFS. It has
been said previously that random read accesses are limited by the
performance of the disks.
Can I artificially boost standard disks by adding more replicas to
improve random reads?
Thanks
TuX
PS: I read http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation but
they do not mention the dfs.replication as a parameter affecting random
reads
The page http://wiki.apache.org/hadoop/PerformanceTuning mentions a
different parameter block.replication.factor which is more mapreduce
oriented.