[ https://issues.apache.org/jira/browse/ZOOKEEPER-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624086#comment-13624086 ]
Shevek commented on ZOOKEEPER-1683: ----------------------------------- The trouble is that there's hidden state in flight. The question is, what is the state of the ZooKeeper client when the constructor returns. If it is determined that it has a remote socket address, i.e. updateServerList should be callable as-written, and be safe, then ClientCnxn needs to SYNCHRONOUSLY, i.e. not via SendThread.start() record the "intended" target connection address. If we follow the proposal to allow the address to be null in updateServerList, then either: a) We nulls (treat it as always in the list), in which case the "invisible" state of the intended server address of the in-flight connect may still allow us to connect to a "wrong" server. b) We say a null is never in the list, in which case repeated fast calls to updateServerList could cause a denial of service of the ZooKeeper client. The difficulty, I think, is that the state of the client when "new ZooKeeper" returns is defined by a race condition. The INTENT of the connection needs to be recorded in 'new ClientCnxn' before SendThread.start() goes asynchronous. A somewhat uglier hack might be to make StaticHostProvider cache the previously returned value and give the ClientCnxn a kicking if the cached value was removed from the list. While this would also work, I feel that it would contribute negatively to the overall complexity and coupledness of the code. > ZooKeeper client NPE when updating server list on disconnected client > --------------------------------------------------------------------- > > Key: ZOOKEEPER-1683 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1683 > Project: ZooKeeper > Issue Type: Bug > Reporter: Shevek > Assignee: Alexander Shraer > > 2013-04-04 22:16:15,872 ERROR [pool-4-thread-1] > com.netflix.curator.ConnectionState.getZooKeeper (ConnectionState.java:84) - > Background exception caught > java.lang.NullPointerException > at > org.apache.zookeeper.client.StaticHostProvider.updateServerList(StaticHostProvider.java:161) > ~[zookeeper-3.5.0.jar:3.5.0--1] > at > org.apache.zookeeper.ZooKeeper.updateServerList(ZooKeeper.java:183) > ~[zookeeper-3.5.0.jar:3.5.0--1] > at > com.netflix.curator.HandleHolder$1$1.setConnectionString(HandleHolder.java:121) > ~[curator-client-1.3.5-SNAPSHOT.jar:?] > The duff code is this: > ClientCnxnSocket clientCnxnSocket = cnxn.sendThread.getClientCnxnSocket(); > InetSocketAddress currentHost = (InetSocketAddress) > clientCnxnSocket.getRemoteSocketAddress(); > boolean reconfigMode = hostProvider.updateServerList(serverAddresses, > currentHost); > Now, currentHost might be null, if we're not yet connected. But > StaticHostProvider.updateServerList indirects on it unconditionally. > This would be caught by findbugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira