[ 
https://issues.apache.org/jira/browse/HADOOP-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242944#comment-13242944
 ] 

Todd Lipcon commented on HADOOP-8217:
-------------------------------------

bq. I would like to question the value of FC2 calling NN1.transitionToStandby() 
in general. FC1 on NN1 is supposed to call NN1.transitionToStandby() because 
thats is FC1's responsibility upon losing the leader lock.

This doesn't work, since FC1 can take arbitrarily long to notice that it has 
lost its lock.

bq. Secondly, based on the recent work done to add breadcrumbs to the 
ActiveStandbyElector, FC2 is going to fence NN1 if NN1 has not gracefully given 
up the lock, which is clearly the case here. So the problem is already solved 
unless I am mistaken.

But the first stage of "fencing" is to gracefully ask the NN to go to standby. 
This is exactly the problem here. If, instead, we always required that we 
always use an aggressive fencing mechanism (STONITH/NAS fencing), you're right 
that there would not be a problem. But we can avoid that in many cases -- for 
example, imagine that the active node loses its connection to the ZK quorum, 
but still has a connection to the other NN (eg by a crossover cable). In this 
case it will leave its breadcrumb znode there, but the new active can easily 
transition it to standby.

Here's another way of looking at this JIRA:
- the "aggressive" fencing mechanisms have the property of being "persistent". 
i.e after fencing, the node cannot become active, even if asked to.
- the "graceful" fencing mechanism (transitionToStandby() RPC) does not 
currently have the property of being "persistent". If another older node asks 
it to become active after it's been "gracefully fenced", it will do so 
incorrectly.
- This JIRA makes "graceful fencing" persistent, so it can be used correctly.


Regarding the ActiveStandbyElector callback for {{becomeStandby}}, I actually 
think it's redundant. There are two cases in which it could be called:
- If already standby, it's a no-op
- If active, then this indicates that the elector lost its znode. Since it lost 
its znode (rather than quitting the election gracefully), it will leave its 
breadcrumb behind. Thus, the other node will fence it. So, calling 
transitionToStandby is redundant with fencing which the other node will have to 
perform anyway.
                
> Edge case split-brain race in ZK-based auto-failover
> ----------------------------------------------------
>
>                 Key: HADOOP-8217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: auto-failover, ha
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8217-testcase.txt
>
>
> As discussed in HADOOP-8206, the current design for automatic failover has 
> the following race:
> - ZKFC1 gets active lock
> - ZKFC1 is about to send transitionToActive() and machine freezes (eg GC 
> pause + swapping)
> - ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock
> - ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active
> - ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad 
> situation
> This is rare, since it requires ZKFC1 to freeze longer than its ZK session 
> timeout, but worth fixing, since the results can be disastrous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to