[ https://issues.apache.org/jira/browse/HDFS-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162019#comment-13162019 ]
Todd Lipcon commented on HDFS-1972: ----------------------------------- I broke out the basic implementation of determining who is active based on heartbeat responses to HDFS-2627. I've thought a bit about the proposed solution above and haven't come up with any holes yet... my next steps for early next week will be these (hopefully separate patches for each for easy review): - finish HDFS-2627 to a commitable state (first draft patch is up but needs a functional test and some code/style cleanup) - implement the DN side "promise" functionality - when it detects a new active, it needs to ACK the active transition, and the NN needs to keep track of whether each DN has ACKed. - implement code in the NN which prevents issuance of block invalidation until DNs have acked and sent block reports - implement a stress test: create many blocks, and toggle them back and forth between replication level 1 and replication level 2. fail back and forth between two NNs continuously. Ensure that after several hours of runtime we haven't lost any blocks. > HA: Datanode fencing mechanism > ------------------------------ > > Key: HDFS-1972 > URL: https://issues.apache.org/jira/browse/HDFS-1972 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, name-node > Reporter: Suresh Srinivas > Assignee: Todd Lipcon > Attachments: hdfs-1972-v1.txt > > > In high availability setup, with an active and standby namenode, there is a > possibility of two namenodes sending commands to the datanode. The datanode > must honor commands from only the active namenode and reject the commands > from standby, to prevent corruption. This invariant must be complied with > during fail over and other states such as split brain. This jira addresses > issues related to this, design of the solution and implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira