[jira] [Commented] (HADOOP-8217) Edge case split-brain race in ZK-based auto-failover

Todd Lipcon (Commented) (JIRA) Fri, 30 Mar 2012 16:47:52 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242855#comment-13242855
 ]


Todd Lipcon commented on HADOOP-8217:
-------------------------------------

bq. 3. ZKFC2 tries to do transitionToStandby() on NN1. RPC times out.
bq. 4. Don't know what happens now in your design

As has been the case in all of the HA work up to and including this point, it 
initiates the fence method at this point. The fence method has to do persistent 
fencing of the shared resource (eg. disable access to the SAN or STONITH the 
node). Please refer to the code in which I think this is fairly clear.

The solution here is to improve the ability to do failover when "graceful 
fencing" suffices. In many failover cases it's preferable to _not_ have to 
invoke STONITH or storage fencing, since those mechanisms will often require 
administrative intervention to un-fence.

bq. Given, the above, how will NN1 receive the zxid from ZKFC2? If it does not 
then the solution is invalid. Hari's scenario exemplifies this.

All transitionToActive/transitionToStandby calls would include the zxid. So, 
the sequence becomes:


1. ZKFC1 gets active lock (zxid=1)
2. ZKFC1 is about to send transitionToActive(1) and machine freezes (eg GC 
pause + swapping)
3. ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock (zxid=2)
4. ZKFC2 calls NN1.transitionToStandby(2) and NN2.transitionToActive(2).
5. ZKFC1 wakes up from pause, calls NN1.transitionToActive(1). NN1 rejects the 
request because it previously accepted zxid=2 in step 4 above. 

or the failure case:
4(failure case): if NN1.transitionToStandby() times out or fails, the 
non-graceful fencing is initiated (same as in existing HA code for the last 
several months)
5(failure case with storage fencing): ZKFC1 wakes up from pause, and calls 
NN1.transitionToActive(1). NN1 tries to access the shared edits storage and 
fails, because it has been fenced. So, there is no split-brain.
5(failure case with STONITH): ZKFC1 never wakes up from pause, because its 
power plug has been pulled. So, there is no split-brain.


                
> Edge case split-brain race in ZK-based auto-failover
> ----------------------------------------------------
>
>                 Key: HADOOP-8217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: auto-failover, ha
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8217-testcase.txt
>
>
> As discussed in HADOOP-8206, the current design for automatic failover has 
> the following race:
> - ZKFC1 gets active lock
> - ZKFC1 is about to send transitionToActive() and machine freezes (eg GC 
> pause + swapping)
> - ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock
> - ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active
> - ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad 
> situation
> This is rare, since it requires ZKFC1 to freeze longer than its ZK session 
> timeout, but worth fixing, since the results can be disastrous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8217) Edge case split-brain race in ZK-based auto-failover

Reply via email to