[ https://issues.apache.org/jira/browse/ZOOKEEPER-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michi Mutsuzaki updated ZOOKEEPER-1900: --------------------------------------- Summary: NullPointerException in truncate (was: HTTP ERROR 410 Problem accessing /getimage. Reason: GetImage failed. java.io.IOException: Namenode is not expecting an new image UPLOAD_START) > NullPointerException in truncate > --------------------------------- > > Key: ZOOKEEPER-1900 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1900 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.5 > Environment: linux java 1.6 > Reporter: Steven Bower > Priority: Blocker > Fix For: 3.4.7, 3.5.0 > > > The other day we started up a ZK instance that had been down for a bit (1day) > and started getting NPEs all over the place... > {noformat} > 2014-20-03 11:15:42.320 INFO QuorumPeerConfig [main] - Reading configuration > from: /xxx/bin/zk/etc/zk.cfg > 2014-20-03 11:15:42.350 INFO QuorumPeerConfig [main] - Defaulting to > majority quorums > 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - > autopurge.snapRetainCount set to 3 > 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - > autopurge.purgeInterval set to 0 > 2014-20-03 11:15:42.353 INFO DatadirCleanupManager [main] - Purge task is > not scheduled. > 2014-20-03 11:15:42.385 INFO QuorumPeerMain [main] - Starting quorum peer > 2014-20-03 11:15:42.399 INFO NIOServerCnxnFactory [main] - binding to port > 0.0.0.0/0.0.0.0:5555 > 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - tickTime set to 2000 > 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - minSessionTimeout set to -1 > 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - maxSessionTimeout set to -1 > 2014-20-03 11:15:42.413 INFO QuorumPeer [main] - initLimit set to 10 > 2014-20-03 11:15:42.456 INFO FileSnap [main] - Reading snapshot > /xxx/zk_data/version-2/snapshot.2c00000000 > 2014-20-03 11:15:42.463 INFO QuorumCnxManager [Thread-3] - My election bind > port: 0.0.0.0/0.0.0.0:7555 > 2014-20-03 11:15:42.470 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING > 2014-20-03 11:15:42.471 INFO FastLeaderElection > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, > proposed zxid=0x8000000000000000 > 2014-20-03 11:15:42.479 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.479 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.482 INFO QuorumCnxManager [WorkerSender[myid=3]] - Have > smaller server identifier, so dropping the connection: (5, 3) > 2014-20-03 11:15:42.482 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING > (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.482 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING > (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.482 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING > 2014-20-03 11:15:42.486 INFO Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - TCP NoDelay set to: true > 2014-20-03 11:15:42.488 INFO QuorumCnxManager [host1/###.###.###.###:7555] - > Received connection request /###.###.###.###:64528 > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:host.name=host1 > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.version=1.6.0_20 > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.vendor=Sun Microsystems Inc. > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.home=/xxx/util/common/jdk1.6.0_20_64bit/jre > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.class.path=/xxx/bin/zk/etc:/xxx/bin/zk/lib/slf4j-log4j12-1.7.2.jar:/xxx/bin/zk/lib/jline-0.9.94.jar:/xxx/bin/zk/lib/jul-to-slf4j-1.7.2.jar:/xxx/bin/zk/lib/ZooInspector-3.4.5.jar:/xxx/bin/zk/lib/jcl-over-slf4j-1.7.2.jar:/xxx/bin/zk/lib/log4j-1.2.17.jar:/xxx/bin/zk/lib/zookeeper-3.4.5.jar:/xxx/bin/zk/lib/slf4j-api-1.7.2.jar:/xxx/bin/zk/lib/netty-3.2.2.Final.jar > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.library.path=/xxx/util/common/jdk1.6.0_20_64bit/jre/lib/amd64/server:/xxx/util/common/jdk1.6.0_20_64bit/jre/lib/amd64:/xxx/util/common/jdk1.6.0_20_64bit/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.io.tmpdir=/tmp > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:java.compiler=<NA> > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:os.name=Linux > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:os.arch=amd64 > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:os.version=2.6.32-220.2.1.el6.x86_64 > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server environment:user.name=op > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:user.home=/xxx/bin > 2014-20-03 11:15:42.490 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Server > environment:user.dir=/xxx/bin > 2014-20-03 11:15:42.491 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 > minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 > snapdir /xxx/zk_data/version-2 > 2014-20-03 11:15:42.493 INFO Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing > host4/###.###.###.###:6555 > 2014-20-03 11:15:42.495 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state) > 2014-20-03 11:15:42.498 WARN Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Truncating log to get in sync > with the leader 0x2b00000002 > 2014-20-03 11:15:42.499 WARN QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception > java.lang.NullPointerException > at > org.apache.zookeeper.server.persistence.FileTxnLog.truncate(FileTxnLog.java:352) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.truncateLog(FileTxnSnapLog.java:259) > at > org.apache.zookeeper.server.ZKDatabase.truncateLog(ZKDatabase.java:438) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:339) > at > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:72) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727) > 2014-20-03 11:15:42.500 INFO Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - shutdown called > java.lang.Exception: shutdown Observer > at > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731) > 2014-20-03 11:15:42.500 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - shutting down > 2014-20-03 11:15:42.500 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING > 2014-20-03 11:15:42.501 INFO FastLeaderElection > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, > proposed zxid=0x8000000000000000 > 2014-20-03 11:15:42.503 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.503 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING > (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 11:15:42.503 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING > 2014-20-03 11:15:42.503 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 > minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 > snapdir /xxx/zk_data/version-2 > 2014-20-03 11:15:42.504 INFO Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing > host4/###.###.###.###:6555 > 2014-20-03 11:15:42.504 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state) > 2014-20-03 11:15:42.514 INFO FileSnap > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Reading snapshot > /xxx/zk_data/version-2/snapshot.2c00000000 > 2014-20-03 11:15:42.517 WARN Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Truncating log to get in sync > with the leader 0x2b00000002 > 2014-20-03 11:15:42.518 WARN QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception > java.lang.NullPointerException > at > org.apache.zookeeper.server.persistence.FileTxnLog.truncate(FileTxnLog.java:352) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.truncateLog(FileTxnSnapLog.java:259) > at > org.apache.zookeeper.server.ZKDatabase.truncateLog(ZKDatabase.java:438) > at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:339) > at > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:72) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727) > {noformat} > This exception went on and on over and over again (more than 1M times in a > day) until it then began spewing this exception: > {noformat} > 2014-20-03 13:45:32.843 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - LOOKING > 2014-20-03 13:45:32.843 INFO FastLeaderElection > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - New election. My id = 3, > proposed zxid=0x8000000000000000 > 2014-20-03 13:45:32.844 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 13:45:32.845 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), LEADING > (n.state), 2 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state) > 2014-20-03 13:45:32.845 INFO QuorumPeer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - OBSERVING > 2014-20-03 13:45:32.845 INFO FastLeaderElection [WorkerReceiver[myid=3]] - > Notification: 2 (n.leader), 0x2b00000002 (n.zxid), 0x2c (n.round), FOLLOWING > (n.state), 5 (n.sid), 0x2b (n.peerEPoch), OBSERVING (my state) > 2014-20-03 13:45:32.845 INFO ZooKeeperServer > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Created server with tickTime 2000 > minSessionTimeout 4000 maxSessionTimeout 40000 datadir /xxx/zk_log/version-2 > snapdir /xxx/zk_data/version-2 > 2014-20-03 13:45:32.845 INFO Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Observing > host4/###.###.###.###:6555 > 2014-20-03 13:45:32.853 WARN Learner > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Unexpected exception, tries=0, > connecting to host4/###.###.###.###:6555 > java.net.ConnectException: Cannot assign requested address > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:529) > at > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:224) > at > org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:69) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727) > 2014-20-03 13:45:33.863 INFO FileSnap > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:5555] - Reading snapshot > /xxx/zk_data/version-2/snapshot.2c00000000 > {noformat} > This exception for a while was interspersed with the NPEs but eventually it > just was spewing the ConnectionException. > Looking through the code a bit it seems if the FileTxnIterator when > initialized cannot find any log files the {{inputStream}} is set to null > which causes truncate() to NPE.. I see in 3.4.6 this has been wrapped in a > try/finally which closes the iterator.. but i presume that this issue would > still remain. > Looking at the system in this state there were 29k+ sockets in CLOSE_WAIT > state on the system and looking at a heap dump there were tons of Socket > objects waiting for GC (ie not getting properly closed).. this eventually ran > the system out of ephemeral ports and hence the ConnectionExceptions.. > It would seem that a quick check of {{itr.next()}} prior to attempting > truncation would resolve the NPE, but it seems somewhere a connection is not > being closed properly when an exception occurs. -- This message was sent by Atlassian JIRA (v6.2#6252)