[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327976#comment-14327976
 ] 

Hongchao Deng commented on ZOOKEEPER-1907:
------------------------------------------

The latest patch looks really good to me, +1!

I have a concern which is not related to the patch:
In ZooKeeperServer#submitRequest():
{code}
     if (firstProcessor == null) {
            synchronized (this) {
                try {
                    while (!running) {
                        wait(1000);
                    }
                } catch (InterruptedException e) {
                    LOG.warn("Unexpected interruption", e);
                }
                if (firstProcessor == null) {
                    throw new RuntimeException("Not started");
                }
            }
        }
{code}

I think the purpose of this code is to wait on initial setup. I can see there's 
a race that after shutdown() "running" is set to false. And the while loop will 
go forever. Moreover, submitRequest() doesn't handle the case when server is 
shut down. It seems fine because it's shut down.

I really hope the first "running" race could be fixed in this issue. A way I 
can think of would be to change running from bool to int (or enum: STARTING, 
RUNNING, SHUTDOWN).

> Improve Thread handling
> -----------------------
>
>                 Key: ZOOKEEPER-1907
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.5.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 3.5.1, 3.6.0
>
>         Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, 
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like  
> RequestProcessor chains et. When going through each threads, most of them 
> having the similar structure like:
> {code}
> public void run() {
>         try {
>               while(running)
>                    // processing logic
>               }
>         } catch (InterruptedException e) {
>             LOG.error("Unexpected interruption", e);
>         } catch (Exception e) {
>             LOG.error("Unexpected exception", e);
>         }
>         LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the 
> thread by swallowing the exception. If this happens in the production, the 
> server would get hanged forever and would not be able to deliver its role. 
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion 
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to