[ https://issues.apache.org/jira/browse/ZOOKEEPER-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840464#comment-15840464 ]
Edward Ribeiro commented on ZOOKEEPER-2464: ------------------------------------------- Yeah... makes sense, it's pretty inconsistent behavior. It would require some defensive code as {{DataNode.setChildren(null)}} could introduce the null again. In fact, it is a total refactoring of {{DataNode}}, albeit a small class. Wdyt [~randgalt]? > NullPointerException on ContainerManager > ---------------------------------------- > > Key: ZOOKEEPER-2464 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2464 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.1 > Reporter: Stefano Salmaso > Assignee: Jordan Zimmerman > Fix For: 3.5.3, 3.6.0 > > Attachments: ContainerManagerTest.java, ZOOKEEPER-2464.patch > > > I would like to expose you to a problem that we are experiencing. > We are using a cluster of 7 zookeeper and we use them to implement a > distributed lock using Curator > (http://curator.apache.org/curator-recipes/shared-reentrant-lock.html) > So .. we tried to play with the servers to see if everything worked properly > and we stopped and start servers to see that the system worked well > (like stop 03, stop 05, stop 06, start 05, start 06, start 03) > We saw a strange behavior. > The number of znodes grew up without stopping (normally we had 4000 or 5000, > we got to 60,000 and then we stopped our application) > In zookeeeper logs I saw this (on leader only, one every minute) > 2016-07-04 14:53:50,302 [myid:7] - ERROR > [ContainerManagerTask:ContainerManager$1@84] - Error checking containers > java.lang.NullPointerException > at > org.apache.zookeeper.server.ContainerManager.getCandidates(ContainerManager.java:151) > at > org.apache.zookeeper.server.ContainerManager.checkContainers(ContainerManager.java:111) > at > org.apache.zookeeper.server.ContainerManager$1.run(ContainerManager.java:78) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > We have not yet deleted the data ... so the problem can be reproduced on our > servers -- This message was sent by Atlassian JIRA (v6.3.4#6332)