[ https://issues.apache.org/jira/browse/HBASE-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798412#action_12798412 ]
Eli Collins commented on HBASE-2098: ------------------------------------ bq. Failing over to another node where a healthy replica of NN store exists and starting an NN instance will cause the NN to collect block information from every "new" and "unknown" DataNode for the first time. Check out HDFS-839 (NN forwards block reports to the BNN). Enabling high availability via fast automatic fail over to the backup name node is something HDFS developers are working on. You also might find [Dhruba's recent post on HA|http://hadoopblog.blogspot.com/2009/11/hdfs-high-availability.html] of interest. > [EC2] Build a HA cluster > ------------------------ > > Key: HBASE-2098 > URL: https://issues.apache.org/jira/browse/HBASE-2098 > Project: Hadoop HBase > Issue Type: Sub-task > Reporter: Andrew Purtell > Assignee: Andrew Purtell > Priority: Minor > Fix For: 0.21.0 > > > The Hadoop NameNode is a single point of failure. If the master instance > fails, HDFS is down; therefore, HBase as well. So we do not try to deploy > HBase in a multimaster configuration for that reason. Instead we colocate the > HDFS NameNode and HBase HMaster on the same instance and run with its failure > as a known risk. As these EC2 scripts are starter scripts which can (and > should) be customized, this is ok, but we can do better. We should deploy a > fully fault tolerant Hadoop+HBase cluster as a worked example of how to do > it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.