[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408705#comment-13408705
 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

I disabled write cache on the drive that holds my zookeeper database, and it 
still fails in exactly the same way :-<. 

Here's the part that really baffles me, I tried removing the on-disk database 
entirely (the version-2 directory) and starting up zookeeper again on the 
thought that it would just pull down a fresh copy of the database from one of 
its peers. Unfortunately it still fails to connect. See the output below:

root@SF-42:/sf/data# java -cp 
/opt/zookeeper-3.5.0-p7/zookeeper-3.5.0-p7.jar:/opt/zookeeper-3.5.0-p7/lib/log4j-1.2.16.jar:/opt/zookeeper-3.5.0-p7/lib/commons-cli-1.2.jar:/opt/zookeeper-3.5.0-p7/lib/slf4j-log4j12-1.6.2.jar:/opt/zookeeper-3.5.0-p7/lib/netty-3.2.5.Final.jar:/opt/zookeeper-3.5.0-p7/lib/jline-0.9.94.jar:/opt/zookeeper-3.5.0-p7/lib/javacc.jar:/opt/zookeeper-3.5.0-p7/lib/slf4j-api-1.6.2.jar:/opt/zookeeper-3.5.0-p7/conf
 -Dzookeeper.root.logger=DEBUG,CONSOLE -Dzookeeper.log.dir=. 
-Dzookeeper.tracelog.dir=/sf/data/zookeeper/10.10.5.42/ 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false 
-Djute.maxbuffer=4194304 org.apache.zookeeper.server.quorum.QuorumPeerMain 
/sf/data/zookeeper/10.10.5.42/10.10.5.42_2181.cfg
2012-07-07 10:20:23,270 [myid:] - INFO  [main:QuorumPeerConfig@99] - Reading 
configuration from: /sf/data/zookeeper/10.10.5.42/10.10.5.42_2181.cfg
2012-07-07 10:20:23,279 [myid:2] - INFO  [main:DatadirCleanupManager@78] - 
autopurge.snapRetainCount set to 5
2012-07-07 10:20:23,279 [myid:2] - INFO  [main:DatadirCleanupManager@79] - 
autopurge.purgeInterval set to 1
2012-07-07 10:20:23,280 [myid:2] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2012-07-07 10:20:23,289 [myid:2] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2012-07-07 10:20:23,290 [myid:2] - INFO  [main:QuorumPeerMain@131] - Starting 
quorum peer
2012-07-07 10:20:23,300 [myid:2] - INFO  [main:NIOServerCnxnFactory@108] - 
binding to port /10.10.5.42:2181
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1107] - tickTime set 
to 2000
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1127] - 
minSessionTimeout set to -1
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1138] - 
maxSessionTimeout set to -1
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1153] - initLimit set 
to 10
2012-07-07 10:20:23,321 [myid:2] - INFO  [main:QuorumPeer@620] - currentEpoch 
not found! Creating with a reasonable default of 0. This should only happen 
when you are upgrading your installation
2012-07-07 10:20:23,322 [myid:2] - INFO  [main:QuorumPeer@635] - acceptedEpoch 
not found! Creating with a reasonable default of 0. This should only happen 
when you are upgrading your installation
2012-07-07 10:20:23,325 [myid:2] - INFO  
[QuorumPeerListener:QuorumCnxManager$Listener@530] - My election bind port: 
/10.10.5.42:2183
2012-07-07 10:20:23,333 [myid:2] - INFO  
[QuorumPeer[myid=2]/10.10.5.42:2181:QuorumPeer@860] - LOOKING
2012-07-07 10:20:23,334 [myid:2] - INFO  
[QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@831] - New election. My 
id =  2, proposed zxid=0x0
2012-07-07 10:20:23,341 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted 
socket connection from /10.10.5.44:48534
2012-07-07 10:20:23,342 [myid:2] - INFO  
[WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, 
so dropping the connection: (3, 2)
2012-07-07 10:20:23,342 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 
0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), 
LOOKING (my state)0 (n.config version)
2012-07-07 10:20:23,345 [myid:2] - WARN  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing 
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:23,346 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket 
connection for client /10.10.5.44:48534 (no session established for client)
2012-07-07 10:20:23,544 [myid:2] - INFO  
[QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time 
out: 400
2012-07-07 10:20:23,545 [myid:2] - INFO  
[WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, 
so dropping the connection: (3, 2)
2012-07-07 10:20:23,545 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 
0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), 
LOOKING (my state)0 (n.config version)
2012-07-07 10:20:23,680 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted 
socket connection from /10.10.5.44:48535
2012-07-07 10:20:23,680 [myid:2] - WARN  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing 
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:23,680 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket 
connection for client /10.10.5.44:48535 (no session established for client)
2012-07-07 10:20:23,946 [myid:2] - INFO  
[QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time 
out: 800
2012-07-07 10:20:23,946 [myid:2] - INFO  
[WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, 
so dropping the connection: (3, 2)
2012-07-07 10:20:23,947 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 
0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), 
LOOKING (my state)0 (n.config version)
2012-07-07 10:20:24,014 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted 
socket connection from /10.10.5.44:48536
2012-07-07 10:20:24,014 [myid:2] - WARN  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing 
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,015 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket 
connection for client /10.10.5.44:48536 (no session established for client)
2012-07-07 10:20:24,349 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted 
socket connection from /10.10.5.44:48650
2012-07-07 10:20:24,349 [myid:2] - WARN  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing 
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,349 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket 
connection for client /10.10.5.44:48650 (no session established for client)
2012-07-07 10:20:24,683 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted 
socket connection from /10.10.5.44:48678
2012-07-07 10:20:24,683 [myid:2] - WARN  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing 
close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,683 [myid:2] - INFO  
[NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket 
connection for client /10.10.5.44:48678 (no session established for client)
2012-07-07 10:20:24,747 [myid:2] - INFO  
[QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time 
out: 1600
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 
> 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 
> 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that 
> during server recovery 
> org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() 
> does not indicate if the available logs are valid or not. In some cases (say 
> a truncated record and a single txnlog in the datadir) we will not detect 
> that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to