[ https://issues.apache.org/jira/browse/HDFS-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160618#comment-13160618 ]
Andrew Purtell commented on HDFS-1972: -------------------------------------- I will go back to lurking on this issue right away but kindly allow me to +1 this notion: bq. Persisting the txid in the DN disks actually has another nice property for non-HA clusters -- if you accidentally restart the NN from an old snapshot of the filesystem state, the DNs can refuse to connect, or refuse to process deletions. Currently, in this situation, the DNs would connect and then delete all of the newer blocks. Encountering this scenario through a series of accidents has been a concern. Disallowing block deletion as proposed would be enough to give the operators a chance to recover from their mistake before permanent damage. > HA: Datanode fencing mechanism > ------------------------------ > > Key: HDFS-1972 > URL: https://issues.apache.org/jira/browse/HDFS-1972 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, name-node > Reporter: Suresh Srinivas > Assignee: Todd Lipcon > > In high availability setup, with an active and standby namenode, there is a > possibility of two namenodes sending commands to the datanode. The datanode > must honor commands from only the active namenode and reject the commands > from standby, to prevent corruption. This invariant must be complied with > during fail over and other states such as split brain. This jira addresses > issues related to this, design of the solution and implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira