Jean-Adrien wrote:
By the way, is there an obvious link between dfs DataNodes cluster size and
hbase HRegionServers cluster ? I'm not sure what is the meaning of the fact
that the hadoop slaves file is a synonym of hbase regionServers file (as
seen in the documentation  http://hadoop.apache.org/hbase/docs/current/ API
), and how the hbase deals with hadoop-site.xml config file ; I mean what is
the purpose to have ${HADOOP_CONF} dir in the hbase classpath ?

There is no 'obvious' heuristic that we're aware of.

Optimally, regionservers would run on top of the datanode hosting their the 
regionservers' data (We have a bit of work to do to make this happen).  If a 
running regionserver was light as a feather, we'd suggest just putting up a 
regionserver on every datanode but unfortunately, they cost some so the set of 
regionservers and datanodes tend to diverge.  Access patterns, amount of hbase 
data, proportion of your hdfs data that is up in your hbase instance and 
strength of your hosting servers are some of the inputs to consider sizing your 
hbase cluster.  Because the two sets don't often match, we have a regionserver 
file apart from slaves for listing the hosts carrying hbase cluster members.

The documentation on what the regionservers file is, is misleading/incorrect.  
I'll fix it so instead of 'synonym', it says 'is like the'.

Are you seeing the HADOOP_CONF_DIR in your CLASSPATH?  Its not there by 
default, not since we became a subproject at least.

Regards configuration in hadoop-site.xml, we don't read it unless you 
explicitly add it to the hbase CLASSPATH (You can add it by adding it in 
hbase-env.sh to the HBASE_CLASSPATH variable).  Most of the time, hbase doesn't 
need to know hadoop-site.xml site-specific configurations but if the 
configurations effect hdfs clients, then you'll want hbase to pick them up.  
One example would be use of non-default replication count.  I'm sure there are 
others.

St.Ack

Reply via email to