Thawan Kooburat created ZOOKEEPER-1484:
------------------------------------------

             Summary: Missing znode found in the follower
                 Key: ZOOKEEPER-1484
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1484
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.3
            Reporter: Thawan Kooburat
            Assignee: Thawan Kooburat
            Priority: Critical


We noticed that one of the follower fail to restart due to missing parent node

{noformat}
2012-05-29 15:44:41,037 [myid:9] - INFO [main:FileSnap@83] - Reading snapshot 
/var/facebook/zeus-server/data/global-ropt.0/version-2/snapshot.3d001f19c9
2012-05-29 15:44:43,300 [myid:9] - ERROR [main:FileTxnSnapLog@220] - Parent 
/phpunittest/1862297546 missing for /phpunittest/1862297546/dir1
2012-05-29 15:44:43,302 [myid:9] - ERROR [main:QuorumPeer@488] - Unable to load 
database on disk
java.io.IOException: Failed to process transaction type: 1 error: 
KeeperErrorCode = NoNode for /phpunittest/1862297546
{noformat}

We believed that the root cause is due to bugs in follower sync-up logic. Due 
to race condition, the follower may miss some proposals. The log below show 
that the follower see the commit message but it haven't seen this proposal 
before
{noformat}
2012-05-15 15:11:27,449 [myid:13] - WARN 
[QuorumPeer[myid=13]/0.0.0.0:2182:Learner@378] - Got zxid 0x3c00282dc9 expected 
0x3c00282dca
{noformat}

I can reproduce this by keep running FollowerResyncConcurrencyTest until 
failure occurs. I suspected that the root caused is due to how we handle 
toBeApplied and outstandingProposals in the leader. 

1. In-flight proposals is removed from outstandingProposal before it is added 
to toBeApplied. Most of the problem I seen so far seem to caused by this gap.
2. startForwarding() iterate through outstandingProposal without locking 
PrepRequestProcessor properly, so there is possibility of missing in-flight 
proposal. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to