[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408727#comment-15408727 ] Steve Rowe commented on ZOOKEEPER-2383: --- [~rakeshr], I patched branch-3.4 at revision 1754563 with your {{ZOOKEEPER-2383-br-3-4.patch}}, applied my SOLR-8724 patch with a small change to account for ZOOKEEPER-2141, and changed the depended-on ZK version from 3.4.8 to 3.4.9-SNAPSHOT (the version built off branch-3.4 with your patch), and Solr tests pass, so +1 from me to include your patch in the 3.4.9 release. > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.9, 3.5.3, 3.6.0 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, > zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399680#comment-15399680 ] Steve Rowe commented on ZOOKEEPER-2383: --- [~rakeshr], looks like your patch is against trunk - if you make a branch-3.4 version of the patch, I'll run Solr's tests using a build off that branch (I'm assuming the next Solr ZK dep version will be 3.4.9). > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.9, 3.5.3, 3.6.0 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, > zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185740#comment-15185740 ] Steve Rowe commented on ZOOKEEPER-2383: --- Similarly to the subclassed ZooKeeperServer above, if I apply the following patch to ZK 3.4.8 and run the attached test case with it, the JMX registration race no longer happens: {code:java} Index: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java === --- src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java (revision 1732157) +++ src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java (working copy) @@ -413,10 +413,9 @@ createSessionTracker(); } startSessionTracker(); +registerJMX(); setupRequestProcessors(); -registerJMX(); - state = State.RUNNING; notifyAll(); } {code} > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe >Priority: Blocker > Fix For: 3.4.9 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, > zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185412#comment-15185412 ] Steve Rowe commented on ZOOKEEPER-2383: --- FWIW, I was able to work around the problem in Solr tests by subclassing ZooKeeperServer and ordering server JMX registration before setting up its request processing pipeline, like so: {code:java} private class TestZooKeeperServer extends ZooKeeperServer { @Override protected void registerJMX() { // no-op - super.registerJMX() is called in overridden startup() } /** Register in JMX before starting the request processors. */ @Override public void startup() { super.registerJMX(); super.startup(); } public TestZooKeeperServer(FileTxnSnapLog txnLogFactory, int tickTime, int minSessionTimeout, int maxSessionTimeout, DataTreeBuilder treeBuilder, ZKDatabase zkDb) { super(txnLogFactory, tickTime, minSessionTimeout, maxSessionTimeout, treeBuilder, zkDb); } } {code} > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe >Priority: Blocker > Fix For: 3.4.9 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, > zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated ZOOKEEPER-2383: -- Attachment: zk-3.4.8-NPE.log Sometimes the attached test case will trigger a NullPointerException in ZooKeeperServer.createSession() - see attached zk-3.4.8-NPE.log. For some reason I never saw this failure mode in Solr tests using ZK 3.4.8. {noformat} 2016-03-08 11:29:00,374 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5:NIOServerCnxnFactory@213] - Ignoring unexpected runtime e xception java.lang.NullPointerException at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569) at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:902) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:198) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:244) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:745) {noformat} > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, > zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated ZOOKEEPER-2383: -- Attachment: release-3.4.8-extra-logging.patch zk-3.4.8-MBeanRegistry.log TestZkStandaloneJMXRegistrationRaceConcurrent.java Attaching files: Test case: TestZkStandaloneJMXRegistrationRaceConcurrent.java ZK 3.4.8 source patch to add JXM logging: release-3.4.8-extra-logging.patch Log file exhibiting failure: zk-3.4.8-MBeanRegistry.log > Startup race in ZooKeeperServer > --- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server >Affects Versions: 3.4.8 >Reporter: Steve Rowe > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185285#comment-15185285 ] Steve Rowe commented on ZOOKEEPER-2383: --- This program triggers the problem for me roughly 10% of the time with ZK 3.4.8 - note that if I don't use a thread to start ZooKeeperServer, the connection always comes in after the server has had a chance to register itself with JMX (imports omitted - attaching full file here in a sec): {code:java|title=TestZkStandaloneJMXRegistrationRaceConcurrent.java} public class TestZkStandaloneJMXRegistrationRaceConcurrent { public static void main(String[] args) throws IOException, InterruptedException, KeeperException { class ServerThread extends Thread { private ZooKeeperServer server; private ServerCnxnFactory cnxnFactory; @Override public void run() { try { File tempDir = Files.createTempDirectory(FileSystems.getDefault().getPath("."),"test").toFile(); FileTxnSnapLog txnSnapLog = new FileTxnSnapLog(tempDir, tempDir); server = new ZooKeeperServer (txnSnapLog, 2000, 2000, 4000, null, new ZKDatabase(txnSnapLog)); cnxnFactory = ServerCnxnFactory.createFactory(5, -1); cnxnFactory.startup(server); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } public void shutdown() throws IOException, InterruptedException { cnxnFactory.shutdown(); cnxnFactory.join(); server.shutdown(); } } ServerThread serverThread = new ServerThread(); serverThread.setDaemon(true); serverThread.start(); Thread.sleep(3); ZooKeeper zk = new ZooKeeper("127.0.0.1:5", 45000, new Watcher() { public void process(WatchedEvent event) {} }); zk.create("/testing123", new byte[]{}, Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); serverThread.shutdown(); serverThread.join(); } } {code} Here's an excerpt from a log exhibiting the failure - I'll also attach the full log (I've added some logging to ZK 3.4.8 - I'll attach a patch showing those additions here in a minute): {noformat} 2016-03-08 11:32:08,414 [myid:] - WARN [SyncThread:0:MBeanRegistry@100] - bean 'Connections/127.0.0.1/0x153571244a7' with parent 'StandaloneServer_port5' has null path. java.lang.Throwable: at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:98) at org.apache.zookeeper.server.ServerCnxnFactory.registerConnection(ServerCnxnFactory.java:147) at org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:613) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:181) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) 2016-03-08 11:32:08,414 [myid:] - WARN [Thread-0:MBeanRegistry@118] - registered bean 'StandaloneServer_port5' with parent 'null' at path '/' java.lang.Throwable: at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:116) at org.apache.zookeeper.server.ZooKeeperServer.registerJMX(ZooKeeperServer.java:385) at org.apache.zookeeper.server.ZooKeeperServer.startup(ZooKeeperServer.java:418) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:119) at TestZkStandaloneJMXRegistrationRaceConcurrent$1ServerThread.run(TestZkStandaloneJMXRegistrationRaceConcurrent.java:29) 2016-03-08 11:32:08,415 [myid:] - ERROR [SyncThread:0:ZooKeeperCriticalThread@49] - Severe unrecoverable error, from thread : SyncThread:0 java.lang.AssertionError at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:104) at org.apache.zookeeper.server.ServerCnxnFactory.registerConnection(ServerCnxnFactory.java:147) at org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:613) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:181) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) 2016-03-08 11:32:08,416 [myid:] - WARN [Thread-0:MBeanRegistry@118] - registered bean 'InMemoryDataTree' with parent 'StandaloneServer_port5' at path '/StandaloneServer_port5' java.lang.Throwable: at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:116) at org.apache.zookeeper.server.ZooKeeperServer.registerJMX(ZooKeeperServer.java:389) at
[jira] [Created] (ZOOKEEPER-2383) Startup race in ZooKeeperServer
Steve Rowe created ZOOKEEPER-2383: - Summary: Startup race in ZooKeeperServer Key: ZOOKEEPER-2383 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 Project: ZooKeeper Issue Type: Bug Components: jmx, server Affects Versions: 3.4.8 Reporter: Steve Rowe In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 (SOLR-8724) I ran into test failures where attempts to create a node in a newly started standalone ZooKeeperServer were failing because of an assertion in MBeanRegistry. ZooKeeperServer.startup() first sets up its request processor chain then registers itself in JMX, but if a connection comes in before the server's JMX registration happens, registration of the connection will fail because it trips the assertion that (effectively) its parent (the server) has already registered itself. {code:java|title=ZooKeeperServer.java} public synchronized void startup() { if (sessionTracker == null) { createSessionTracker(); } startSessionTracker(); setupRequestProcessors(); registerJMX(); state = State.RUNNING; notifyAll(); } {code} {code:java|title=MBeanRegistry.java} public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) throws JMException { assert bean != null; String path = null; if (parent != null) { path = mapBean2Path.get(parent); assert path != null; } {code} This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)