[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823388#comment-13823388
 ] 

Flavio Junqueira commented on ZOOKEEPER-1653:
---------------------------------------------

If we can't do any of the operations related to the updating file, then we 
shouldn't keep going, right? Say we fail to create the fail and the server 
keeps executing. In this case we can fall into the same problem we are 
discussing here. I think we should either throw an exception or exit the 
server. 

> zookeeper fails to start because of inconsistent epoch
> ------------------------------------------------------
>
>                 Key: ZOOKEEPER-1653
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1653
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>            Assignee: Michi Mutsuzaki
>             Fix For: 3.4.6
>
>         Attachments: ZOOKEEPER-1653.3.4.patch, ZOOKEEPER-1653.patch, 
> ZOOKEEPER-1653.patch
>
>
> It looks like QuorumPeer.loadDataBase() could fail if the server was 
> restarted after zk.takeSnapshot() but before finishing 
> self.setCurrentEpoch(newEpoch) in Learner.java.
> {code:java}
> case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
>     zk.takeSnapshot();
>     self.setCurrentEpoch(newEpoch); // <<< got restarted here
>     snapshotTaken = true;
>     writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), 
> true);
>     break;
> {code}
> The server fails to start because currentEpoch is still 1 but the last 
> processed zkid from the snapshot has been updated.
> {noformat}
> 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR 
> org.apache.zookeeper.server.quorum.QuorumPeer  - Unable to load database on 
> disk
> java.io.IOException: The current epoch, 1, is older than the last zxid, 
> 8589934592
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
>         ...
> {noformat}
> {noformat}
> $ find datadir                                     
> datadir
> datadir/version-2
> datadir/version-2/currentEpoch.tmp
> datadir/version-2/acceptedEpoch
> datadir/version-2/snapshot.0
> datadir/version-2/currentEpoch
> datadir/version-2/snapshot.200000000
> $ cat datadir/version-2/currentEpoch.tmp
> 2%
> $ cat datadir/version-2/acceptedEpoch
> 2%
> $ cat datadir/version-2/currentEpoch
> 1%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to