[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189157#comment-15189157
 ] 

Rakesh R commented on ZOOKEEPER-2383:
-------------------------------------

I think I found the code changes which results in this bug. Its due to the 
change in startup sequence - ZOOKEEPER-2026.

Here it just moved up setting 'zkServer' reference before starting up the 
standalone server fully. Internally, this 'zkServer' reference is used to see 
ZooKeeperServer's running status. In the defect scenario, the standalone server 
is partially started(by Thread1) and simultaneously client(by Thread2) send a 
connection request. Since the 'zkServer' is not null it proceeds to process the 
connection request and causing the trouble. Please refer my previous comment to 
understand the problematic call sequence.

{code}
ZooKeeperServer.java

    private void readConnectRequest() throws IOException, InterruptedException {
        if (zkServer == null) {
            throw new IOException("ZooKeeperServer not running");
        }
        zkServer.processConnectRequest(this, incomingBuffer);
        initialized = true;
    }
{code}

Probably should use server RUNNING state instead of "zkServer == null" checks 
to know the running status. Server is updating the state to RUNNING after 
starting all the services.
{code}
ZooKeeperServer.java

    public synchronized void startup() {
        if (sessionTracker == null) {
            createSessionTracker();
        }
        startSessionTracker();
        setupRequestProcessors();

        registerJMX();

        state = State.RUNNING;
        notifyAll();
    }
{code}

> Startup race in ZooKeeperServer
> -------------------------------
>
>                 Key: ZOOKEEPER-2383
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx, server
>    Affects Versions: 3.4.8
>            Reporter: Steve Rowe
>            Priority: Blocker
>             Fix For: 3.4.9
>
>         Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
>     public synchronized void startup() {
>         if (sessionTracker == null) {
>             createSessionTracker();
>         }
>         startSessionTracker();
>         setupRequestProcessors();
>         registerJMX();
>         state = State.RUNNING;
>         notifyAll();
>     }
> {code}
> {code:java|title=MBeanRegistry.java}
>     public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
>         throws JMException
>     {
>         assert bean != null;
>         String path = null;
>         if (parent != null) {
>             path = mapBean2Path.get(parent);
>             assert path != null;
>         }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to