[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507169#comment-15507169 ]
Rakesh R commented on ZOOKEEPER-2383: ------------------------------------- bq. The test case isn't covering the whole modified code. In particular, it is not covering the 4lw changes this patch is making. Agreed, would need to add more test cases covering 4lw and netty server. bq. Actually, what prevents us from doing this: Its an interesting thought to view the problem from different angle and frame a simple solution. I could see the following changes will happen if we move #start() at the end. # If someone sends 4lw command to the server during startup, zk server won't respond to it and will not print message {{"This ZooKeeper instance is not currently serving requests"}}. # There is a change in the client and server side logging. Before: {code} 2016-09-20 22:17:33,542 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) 2016-09-20 22:17:33,548 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:12510 2016-09-20 22:17:33,548 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@948] - Socket connection established, initiating session, client: /127.0.0.1:12510, server: 127.0.0.1/127.0.0.1:11222 2016-09-20 22:17:33,563 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn@369] - Exception causing close of session 0x0: ZooKeeperServer not running 2016-09-20 22:17:33,564 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@607] - Closed socket connection for client /127.0.0.1:12510 (no session established for client) 2016-09-20 22:17:33,564 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1231] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect {code} After: {code} 2016-09-20 22:14:15,309 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) 2016-09-20 22:14:15,312 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@948] - Socket connection established, initiating session, client: /127.0.0.1:12418, server: 127.0.0.1/127.0.0.1:11222 2016-09-20 22:14:45,313 [myid:127.0.0.1:11222] - WARN [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1181] - Client session timed out, have not heard from server in 30001ms for sessionid 0x0 {code} Ideally server startup won't take too much time, only exceptional case is zks#loadData() is too large. I'm not aware about the use case of 4lws during startup, do anyone expect quick output shows server not running rather than connection timeout? > Startup race in ZooKeeperServer > ------------------------------- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server > Affects Versions: 3.4.8 > Reporter: Steve Rowe > Assignee: Rakesh R > Priority: Blocker > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, > ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, > zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)