[ 
https://issues.apache.org/jira/browse/HDFS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246865#comment-13246865
 ] 

Todd Lipcon commented on HDFS-3192:
-----------------------------------

bq. Are you suggesting that ZKFC1 does transitionToStandby() when it loses 
znode? 

It currently does, but I don't think it has to -- since ZKFC2 will call 
NN1.transitionToStandby (see below).

bq. On an active NN, there is a high probability that it might abort

There are two possible scenarios:
1) *the local node finds out about the session expiration before the standby.* 
In this case, it will call transitionToStandby, the local node will flush and 
close its edit logs, and gracefully transition.
2) *the local node finds out after the other node.* In this case, the other 
node will have already initiated the fencing process.
2a) If the local node is still accessible, then the other node will have 
already called transitionToStandby(), in which case our own call will be a 
no-op (since we're already in standby state). Everything is correct, because 
the transitionToStandby() call flushes everything and gracefully closes its 
edit log writer.
2b) If the local node is inaccessible (eg network down) then the other node 
initiates non-graceful fencing. If it does STONITH, then our node will go down, 
and the discussion is moot. If it does storage fencing, then our node no longer 
has access to write to storage. This will prevent transitionToStandby() from 
succeeding, since it will try to finalize its current edit log segment (which 
involves mutating the fenced-off storage). So, it will correctly abort.

bq. I don't think that doing tryGraceFulFence() from NN2 to NN1 is safe. First 
of all, this is opening up one more channel of communication between NN1 and 
NN2 and this is subject to various races sequences, split-brain etc.

Doing RPC to your own NN is subject to way more race conditions because we have 
no way of enforcing an ordering between NN1 going standby and NN2 becoming 
active. NN2 *has* to verify that NN1 is either standby or effectively dead 
before becoming active. The only way to do that is to first (a) ask it to be 
standby, or (b) fence.


The lack of correct-ness in relying on self-resign is the example I gave above:

{quote}
1) NN1 writing to edits log
2) ZKFC1 loses lease, but doesn't know about it yet
3) ZKFC2 gets lease
4) NN2 becomes active, starts writing logs
5) NN1 writes some edits. World explodes.
6) ZKFC1 gets asynchronous notification from ZK that it lots its session. 
Anything you do at this point is too late.
{quote}

The "self-resign" in step 6 is insufficient. We have to fence between step 3 
and step 4. Whatever NN1 happens to do _after_ that point doesn't help anything 
because it's too late.

                
> Active NN should exit when it has not received a getServiceStatus() rpc from 
> ZKFC for timeout secs
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3192
>                 URL: https://issues.apache.org/jira/browse/HDFS-3192
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to