[ https://issues.apache.org/jira/browse/ZOOKEEPER-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728269#comment-14728269 ]
Flavio Junqueira commented on ZOOKEEPER-2033: --------------------------------------------- I believe the real issue is that proposals should have be empty in this case and it isn't. It seems that the leader is populating committedLog and generating a snapshot by the end of the synchronization phase, but it is not updating the committedLog list accordingly. I don't know if [~asadpanda] is still interested in finishing this up after such a long time, but I can give it a shot otherwise. > zookeeper follower fails to start after a restart immediately following a new > epoch > ----------------------------------------------------------------------------------- > > Key: ZOOKEEPER-2033 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2033 > Project: ZooKeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.4.6 > Reporter: Asad Saeed > Assignee: Asad Saeed > Fix For: 3.4.7 > > Attachments: ZOOKEEPER-2033-3.4.patch, ZOOKEEPER-2033.patch > > > The following issue was seen when adding a new node to a zookeeper cluster. > Reproduction steps > 1. Create a 2 node ensemble. Write some keys. > 2. Add another node to the ensemble, by modifying the config. Restarting > entire cluster. > 3. Restart the new node before writing any new keys. > What occurs is that the new node gets a SNAP from the newly elected leader, > since it is too far behind. The zxid for this snapshot is from the new epoch > but that is not in the committed log cache. > On restart of this new node. The follower sends the new epoch zxid. The > leader looks at it's maxCommitted logs, and sees that it is not the newest > epoch, and therefore sends a TRUNC. > The follower sees the TRUNC but it only has a snapshot, so it cannot truncate! -- This message was sent by Atlassian JIRA (v6.3.4#6332)