[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damien Hardy updated HDFS-4101: ------------------------------- Summary: ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper (was: ZKFC should implement zookeeper.recovery.retry like HBase to connect to zookeeper) > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > --------------------------------------------------------------------------------- > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha > Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 > Reporter: Damien Hardy > Priority: Minor > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira