[ https://issues.apache.org/jira/browse/ZOOKEEPER-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephan Huttenhuis updated ZOOKEEPER-3513: ------------------------------------------ Description: In ZOOKEEPER-2325 a check was added that requires a snapshot when loading data. We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 months for use with Solr Cloud. During this time some ensembles created a few snapshots but other didn't generate any. Because of this upgrading to e.g. 3.5.5 fails. Either it is perfectly possible for Zookeeper data to have no snapshots or something is going wrong with generating snapshots. The ensembles are straightforward. - The following stack occurs: {noformat} java.io.IOException: No snapshot found, but there are log entries. Something is broken! at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) {noformat} - The zoo.cfg {noformat} # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/data/zookeeper/data # the port at which the clients will connect clientPort=2181 server.1=myserver1:2888:3888 server.2=myserver2:2888:3888 server.3=myserver3:2888:3888 {noformat} - The contents of /data/zookeeper/data/version-2 {noformat} -rw-r--r-- 1 zookeeper zookeeper 1 Aug 7 21:50 acceptedEpoch -rw-r--r-- 1 zookeeper zookeeper 1 Aug 8 20:38 currentEpoch -rw-r--r-- 1 zookeeper zookeeper 65M Apr 1 14:44 log.1 -rw-r--r-- 1 zookeeper zookeeper 65M May 15 23:30 log.100000001 -rw-r--r-- 1 zookeeper zookeeper 65M Jul 3 23:21 log.100001645 -rw-r--r-- 1 zookeeper zookeeper 65M Aug 8 20:37 log.300000802 -rw-r--r-- 1 zookeeper zookeeper 65M Aug 20 13:58 log.70000062a -rw-r--r-- 1 zookeeper zookeeper 65M Apr 4 21:22 log.f0 {noformat} was: In ZOOKEEPER-2325 a check was added that requires a snapshot when loading data. We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 months for use with Solr Cloud. During this time some ensembles created a few snapshots but other didn't generate any. Because of this upgrading to e.g. 3.5.5 fails. Either it is perfectly possible for Zookeeper data to have no snapshots or something is going wrong with generating snapshots. The ensembles are straightforward. - The following stack occurs: {noformat} java.io.IOException: No snapshot found, but there are log entries. Something is broken! at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) {noformat} - The zoo.cfg {noformat} # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/data/zookeeper/data # the port at which the clients will connect clientPort=2181 server.1=myserver1:2888:3888 server.2=myserver1:2888:3888 server.3=myserver1:2888:3888 {noformat} - The contents of /data/zookeeper/data/version-2 {noformat} -rw-r--r-- 1 zookeeper zookeeper 1 Aug 7 21:50 acceptedEpoch -rw-r--r-- 1 zookeeper zookeeper 1 Aug 8 20:38 currentEpoch -rw-r--r-- 1 zookeeper zookeeper 65M Apr 1 14:44 log.1 -rw-r--r-- 1 zookeeper zookeeper 65M May 15 23:30 log.100000001 -rw-r--r-- 1 zookeeper zookeeper 65M Jul 3 23:21 log.100001645 -rw-r--r-- 1 zookeeper zookeeper 65M Aug 8 20:37 log.300000802 -rw-r--r-- 1 zookeeper zookeeper 65M Aug 20 13:58 log.70000062a -rw-r--r-- 1 zookeeper zookeeper 65M Apr 4 21:22 log.f0 {noformat} > Zookeeper upgrade fails due to missing snapshots > ------------------------------------------------ > > Key: ZOOKEEPER-3513 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3513 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.4, 3.6.0 > Reporter: Stephan Huttenhuis > Priority: Major > > In ZOOKEEPER-2325 a check was added that requires a snapshot when loading > data. We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 > months for use with Solr Cloud. During this time some ensembles created a few > snapshots but other didn't generate any. Because of this upgrading to e.g. > 3.5.5 fails. > Either it is perfectly possible for Zookeeper data to have no snapshots or > something is going wrong with generating snapshots. The ensembles are > straightforward. > - The following stack occurs: > {noformat} > java.io.IOException: No snapshot found, but there are log entries. Something > is broken! > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764) > at > org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82) > {noformat} > - The zoo.cfg > {noformat} > # The number of milliseconds of each tick > tickTime=2000 > # The number of ticks that the initial > # synchronization phase can take > initLimit=10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=5 > # the directory where the snapshot is stored. > # do not use /tmp for storage, /tmp here is just > # example sakes. > dataDir=/data/zookeeper/data > # the port at which the clients will connect > clientPort=2181 > server.1=myserver1:2888:3888 > server.2=myserver2:2888:3888 > server.3=myserver3:2888:3888 > {noformat} > > - The contents of /data/zookeeper/data/version-2 > {noformat} > -rw-r--r-- 1 zookeeper zookeeper 1 Aug 7 21:50 acceptedEpoch > -rw-r--r-- 1 zookeeper zookeeper 1 Aug 8 20:38 currentEpoch > -rw-r--r-- 1 zookeeper zookeeper 65M Apr 1 14:44 log.1 > -rw-r--r-- 1 zookeeper zookeeper 65M May 15 23:30 log.100000001 > -rw-r--r-- 1 zookeeper zookeeper 65M Jul 3 23:21 log.100001645 > -rw-r--r-- 1 zookeeper zookeeper 65M Aug 8 20:37 log.300000802 > -rw-r--r-- 1 zookeeper zookeeper 65M Aug 20 13:58 log.70000062a > -rw-r--r-- 1 zookeeper zookeeper 65M Apr 4 21:22 log.f0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)