[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Huttenhuis updated ZOOKEEPER-3513:
------------------------------------------
    Description: 
In ZOOKEEPER-2325 a check was added that requires a snapshot when loading data. 
We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 months 
for use with Solr Cloud. During this time some ensembles created a few 
snapshots but other didn't generate any. Because of this upgrading to e.g. 
3.5.5 fails.

Either it is perfectly possible for Zookeeper data to have no snapshots or 
something is going wrong with generating snapshots. The ensembles are 
straightforward.
 - The following stack occurs:
{noformat}
java.io.IOException: No snapshot found, but there are log entries. Something is 
broken!
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
        at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
        at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
        at 
org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
{noformat}

 - The zoo.cfg
{noformat}
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper/data
# the port at which the clients will connect
clientPort=2181

server.1=myserver1:2888:3888
server.2=myserver2:2888:3888
server.3=myserver3:2888:3888
{noformat}
 

 - The contents of /data/zookeeper/data/version-2
{noformat}
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  7 21:50 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  8 20:38 currentEpoch
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  1 14:44 log.1
-rw-r--r-- 1 zookeeper zookeeper  65M May 15 23:30 log.100000001
-rw-r--r-- 1 zookeeper zookeeper  65M Jul  3 23:21 log.100001645
-rw-r--r-- 1 zookeeper zookeeper  65M Aug  8 20:37 log.300000802
-rw-r--r-- 1 zookeeper zookeeper  65M Aug 20 13:58 log.70000062a
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  4 21:22 log.f0
{noformat}

  was:
In ZOOKEEPER-2325 a check was added that requires a snapshot when loading data. 
We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 months 
for use with Solr Cloud. During this time some ensembles created a few 
snapshots but other didn't generate any. Because of this upgrading to e.g. 
3.5.5 fails.

Either it is perfectly possible for Zookeeper data to have no snapshots or 
something is going wrong with generating snapshots. The ensembles are 
straightforward.
 - The following stack occurs:
{noformat}
java.io.IOException: No snapshot found, but there are log entries. Something is 
broken!
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
        at 
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
        at 
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
        at 
org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
        at 
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
{noformat}

 - The zoo.cfg
{noformat}
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper/data
# the port at which the clients will connect
clientPort=2181

server.1=myserver1:2888:3888
server.2=myserver1:2888:3888
server.3=myserver1:2888:3888
{noformat}
 

 - The contents of /data/zookeeper/data/version-2
{noformat}
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  7 21:50 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper    1 Aug  8 20:38 currentEpoch
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  1 14:44 log.1
-rw-r--r-- 1 zookeeper zookeeper  65M May 15 23:30 log.100000001
-rw-r--r-- 1 zookeeper zookeeper  65M Jul  3 23:21 log.100001645
-rw-r--r-- 1 zookeeper zookeeper  65M Aug  8 20:37 log.300000802
-rw-r--r-- 1 zookeeper zookeeper  65M Aug 20 13:58 log.70000062a
-rw-r--r-- 1 zookeeper zookeeper  65M Apr  4 21:22 log.f0
{noformat}


> Zookeeper upgrade fails due to missing snapshots
> ------------------------------------------------
>
>                 Key: ZOOKEEPER-3513
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3513
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0
>            Reporter: Stephan Huttenhuis
>            Priority: Major
>
> In ZOOKEEPER-2325 a check was added that requires a snapshot when loading 
> data. We have been running 3-node ensembles on Zookeeper 3.4.13 for about 5 
> months for use with Solr Cloud. During this time some ensembles created a few 
> snapshots but other didn't generate any. Because of this upgrading to e.g. 
> 3.5.5 fails.
> Either it is perfectly possible for Zookeeper data to have no snapshots or 
> something is going wrong with generating snapshots. The ensembles are 
> straightforward.
>  - The following stack occurs:
> {noformat}
> java.io.IOException: No snapshot found, but there are log entries. Something 
> is broken!
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>       at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>       at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>       at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>       at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>       at 
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>       at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>       at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>       at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {noformat}
>  - The zoo.cfg
> {noformat}
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> # do not use /tmp for storage, /tmp here is just
> # example sakes.
> dataDir=/data/zookeeper/data
> # the port at which the clients will connect
> clientPort=2181
> server.1=myserver1:2888:3888
> server.2=myserver2:2888:3888
> server.3=myserver3:2888:3888
> {noformat}
>  
>  - The contents of /data/zookeeper/data/version-2
> {noformat}
> -rw-r--r-- 1 zookeeper zookeeper    1 Aug  7 21:50 acceptedEpoch
> -rw-r--r-- 1 zookeeper zookeeper    1 Aug  8 20:38 currentEpoch
> -rw-r--r-- 1 zookeeper zookeeper  65M Apr  1 14:44 log.1
> -rw-r--r-- 1 zookeeper zookeeper  65M May 15 23:30 log.100000001
> -rw-r--r-- 1 zookeeper zookeeper  65M Jul  3 23:21 log.100001645
> -rw-r--r-- 1 zookeeper zookeeper  65M Aug  8 20:37 log.300000802
> -rw-r--r-- 1 zookeeper zookeeper  65M Aug 20 13:58 log.70000062a
> -rw-r--r-- 1 zookeeper zookeeper  65M Apr  4 21:22 log.f0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to