[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507169#comment-15507169
 ] 

Rakesh R commented on ZOOKEEPER-2383:
-------------------------------------

bq. The test case isn't covering the whole modified code. In particular, it is 
not covering the 4lw changes this patch is making.
Agreed, would need to add more test cases covering 4lw and netty server.

bq. Actually, what prevents us from doing this:
Its an interesting thought to view the problem from different angle and frame a 
simple solution. I could see the following changes will happen if we move 
#start() at the end.

# If someone sends 4lw command to the server during startup, zk server won't 
respond to it and will not print message {{"This ZooKeeper instance is not 
currently serving requests"}}.
# There is a change in the client and server side logging.
Before:
{code}
2016-09-20 22:17:33,542 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
2016-09-20 22:17:33,548 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:12510
2016-09-20 22:17:33,548 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@948] - Socket 
connection established, initiating session, client: /127.0.0.1:12510, server: 
127.0.0.1/127.0.0.1:11222
2016-09-20 22:17:33,563 [myid:] - WARN  [NIOWorkerThread-1:NIOServerCnxn@369] - 
Exception causing close of session 0x0: ZooKeeperServer not running
2016-09-20 22:17:33,564 [myid:] - INFO  [NIOWorkerThread-1:NIOServerCnxn@607] - 
Closed socket connection for client /127.0.0.1:12510 (no session established 
for client)
2016-09-20 22:17:33,564 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1231] - Unable to read 
additional data from server sessionid 0x0, likely server has closed socket, 
closing socket connection and attempting reconnect
{code}
After:
{code}
2016-09-20 22:14:15,309 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
2016-09-20 22:14:15,312 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@948] - Socket 
connection established, initiating session, client: /127.0.0.1:12418, server: 
127.0.0.1/127.0.0.1:11222
2016-09-20 22:14:45,313 [myid:127.0.0.1:11222] - WARN  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30001ms for sessionid 0x0
{code}

Ideally server startup won't take too much time, only exceptional case is 
zks#loadData() is too large. I'm not aware about the use case of 4lws during 
startup, do anyone expect quick output shows server not running rather than 
connection timeout?

> Startup race in ZooKeeperServer
> -------------------------------
>
>                 Key: ZOOKEEPER-2383
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx, server
>    Affects Versions: 3.4.8
>            Reporter: Steve Rowe
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 3.4.10, 3.5.3, 3.6.0
>
>         Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, 
> zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
>     public synchronized void startup() {
>         if (sessionTracker == null) {
>             createSessionTracker();
>         }
>         startSessionTracker();
>         setupRequestProcessors();
>         registerJMX();
>         state = State.RUNNING;
>         notifyAll();
>     }
> {code}
> {code:java|title=MBeanRegistry.java}
>     public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
>         throws JMException
>     {
>         assert bean != null;
>         String path = null;
>         if (parent != null) {
>             path = mapBean2Path.get(parent);
>             assert path != null;
>         }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to