[
https://issues.apache.org/jira/browse/ZOOKEEPER-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293116#comment-13293116
]
Thawan Kooburat commented on ZOOKEEPER-1484:
--------------------------------------------
Just noticed that log are from different machines. So the actual root cause is
not yet found, but I think the issue that I point out seem to be a legitimate
problem.
> Missing znode found in the follower
> -----------------------------------
>
> Key: ZOOKEEPER-1484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1484
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.3
> Reporter: Thawan Kooburat
> Assignee: Thawan Kooburat
> Priority: Critical
>
> We noticed that one of the follower fail to restart due to missing parent node
> {noformat}
> 2012-05-29 15:44:41,037 [myid:9] - INFO [main:FileSnap@83] - Reading snapshot
> /var/facebook/zeus-server/data/global-ropt.0/version-2/snapshot.3d001f19c9
> 2012-05-29 15:44:43,300 [myid:9] - ERROR [main:FileTxnSnapLog@220] - Parent
> /phpunittest/1862297546 missing for /phpunittest/1862297546/dir1
> 2012-05-29 15:44:43,302 [myid:9] - ERROR [main:QuorumPeer@488] - Unable to
> load database on disk
> java.io.IOException: Failed to process transaction type: 1 error:
> KeeperErrorCode = NoNode for /phpunittest/1862297546
> {noformat}
> We believed that the root cause is due to bugs in follower sync-up logic. Due
> to race condition, the follower may miss some proposals. The log below show
> that the follower see the commit message but it haven't seen this proposal
> before
> {noformat}
> 2012-05-15 15:11:27,449 [myid:13] - WARN
> [QuorumPeer[myid=13]/0.0.0.0:2182:Learner@378] - Got zxid 0x3c00282dc9
> expected 0x3c00282dca
> {noformat}
> I can reproduce this by keep running FollowerResyncConcurrencyTest until
> failure occurs. I suspected that the root caused is due to how we handle
> toBeApplied and outstandingProposals in the leader.
> 1. In-flight proposals is removed from outstandingProposal before it is added
> to toBeApplied. Most of the problem I seen so far seem to caused by this gap.
> 2. startForwarding() iterate through outstandingProposal without locking
> PrepRequestProcessor properly, so there is possibility of missing in-flight
> proposal.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira