[jira] [Commented] (HDFS-1972) HA: Datanode fencing mechanism

Todd Lipcon (Commented) (JIRA) Fri, 02 Dec 2011 18:08:05 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162019#comment-13162019
 ]


Todd Lipcon commented on HDFS-1972:
-----------------------------------

I broke out the basic implementation of determining who is active based on 
heartbeat responses to HDFS-2627.

I've thought a bit about the proposed solution above and haven't come up with 
any holes yet... my next steps for early next week will be these (hopefully 
separate patches for each for easy review):
- finish HDFS-2627 to a commitable state (first draft patch is up but needs a 
functional test and some code/style cleanup)
- implement the DN side "promise" functionality - when it detects a new active, 
it needs to ACK the active transition, and the NN needs to keep track of 
whether each DN has ACKed.
- implement code in the NN which prevents issuance of block invalidation until 
DNs have acked and sent block reports
- implement a stress test: create many blocks, and toggle them back and forth 
between replication level 1 and replication level 2. fail back and forth 
between two NNs continuously. Ensure that after several hours of runtime we 
haven't lost any blocks.
                
> HA: Datanode fencing mechanism
> ------------------------------
>
>                 Key: HDFS-1972
>                 URL: https://issues.apache.org/jira/browse/HDFS-1972
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, name-node
>            Reporter: Suresh Srinivas
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1972-v1.txt
>
>
> In high availability setup, with an active and standby namenode, there is a 
> possibility of two namenodes sending commands to the datanode. The datanode 
> must honor commands from only the active namenode and reject the commands 
> from standby, to prevent corruption. This invariant must be complied with 
> during fail over and other states such as split brain. This jira addresses 
> issues related to this, design of the solution and implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1972) HA: Datanode fencing mechanism

Reply via email to