[
https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123242#comment-14123242
]
Rakesh R commented on ZOOKEEPER-1907:
-------------------------------------
Thanks [~hdeng] for the comments
bq. I didn't like the idea of while-loop polling each thread's live status.
I've tried different approach by passing listeners to the critical threads and
handing the exception. Please have a look at the patch(yet to add any testcases)
bq.Another thing I understand from the code (if correctly) is when a thread
died, the entire ZK process is shutdown. If so, what is the difference if just
letting the exception go all the way up and shut it down? I am wondering that
the original purpose was to try restarting or so.
I think restarting the critical resources will make it more complex. Like I
mentioned at the beginning there are many critical threads and I'm afraid of
inconsistencies. Simple way of handling is, shutdown and leave the things to
administrator/monitoring tool, which can restart back after rectifying the
cause(For OOME, any functional errors et.).
> Improve Thread handling
> -----------------------
>
> Key: ZOOKEEPER-1907
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.5.0
> Reporter: Rakesh R
> Assignee: Rakesh R
> Fix For: 3.5.1
>
> Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch,
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch,
> ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch
>
>
> Server has many critical threads running and co-ordinating each other like
> RequestProcessor chains et. When going through each threads, most of them
> having the similar structure like:
> {code}
> public void run() {
> try {
> while(running)
> // processing logic
> }
> } catch (InterruptedException e) {
> LOG.error("Unexpected interruption", e);
> } catch (Exception e) {
> LOG.error("Unexpected exception", e);
> }
> LOG.info("...exited loop!");
> }
> {code}
> From the design I could see, there could be a chance of silently leaving the
> thread by swallowing the exception. If this happens in the production, the
> server would get hanged forever and would not be able to deliver its role.
> Now its hard for the management tool to detect this.
> The idea of this JIRA is to discuss and imprv.
> Reference: [Community discussion
> thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)