[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727512#comment-15727512 ] ASF GitHub Bot commented on ZOOKEEPER-1907: --- Github user rakeshadr closed the pull request at: https://github.com/apache/zookeeper/pull/39 > Improve Thread handling > --- > > Key: ZOOKEEPER-1907 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch > > > Server has many critical threads running and co-ordinating each other like > RequestProcessor chains et. When going through each threads, most of them > having the similar structure like: > {code} > public void run() { > try { > while(running) >// processing logic > } > } catch (InterruptedException e) { > LOG.error("Unexpected interruption", e); > } catch (Exception e) { > LOG.error("Unexpected exception", e); > } > LOG.info("...exited loop!"); > } > {code} > From the design I could see, there could be a chance of silently leaving the > thread by swallowing the exception. If this happens in the production, the > server would get hanged forever and would not be able to deliver its role. > Now its hard for the management tool to detect this. > The idea of this JIRA is to discuss and imprv. > Reference: [Community discussion > thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061052#comment-15061052 ] Flavio Junqueira commented on ZOOKEEPER-1907: - We have made requestsInProcess and AtomicInteger in ZOOKEEPER-1504, removing the synchronization of the decIn method. We should just make the same change here for the 3.4 branch. > Improve Thread handling > --- > > Key: ZOOKEEPER-1907 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch > > > Server has many critical threads running and co-ordinating each other like > RequestProcessor chains et. When going through each threads, most of them > having the similar structure like: > {code} > public void run() { > try { > while(running) >// processing logic > } > } catch (InterruptedException e) { > LOG.error("Unexpected interruption", e); > } catch (Exception e) { > LOG.error("Unexpected exception", e); > } > LOG.info("...exited loop!"); > } > {code} > From the design I could see, there could be a chance of silently leaving the > thread by swallowing the exception. If this happens in the production, the > server would get hanged forever and would not be able to deliver its role. > Now its hard for the management tool to detect this. > The idea of this JIRA is to discuss and imprv. > Reference: [Community discussion > thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061380#comment-15061380 ] Rakesh R commented on ZOOKEEPER-1907: - This issue is already committed to 3.5 & trunk. Please see the [commit history|https://issues.apache.org/jira/browse/ZOOKEEPER-1907?focusedCommentId=14352303=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14352303]. Later based on the discussion we re-opened it for backporting to 3.4 branch. bq. We need to sort this out and the deadlock that has been reported in ZOOKEEPER-2347. Sorry for the inconvenience due to this changes:(, yes will fix this issue immediately. > Improve Thread handling > --- > > Key: ZOOKEEPER-1907 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch > > > Server has many critical threads running and co-ordinating each other like > RequestProcessor chains et. When going through each threads, most of them > having the similar structure like: > {code} > public void run() { > try { > while(running) >// processing logic > } > } catch (InterruptedException e) { > LOG.error("Unexpected interruption", e); > } catch (Exception e) { > LOG.error("Unexpected exception", e); > } > LOG.info("...exited loop!"); > } > {code} > From the design I could see, there could be a chance of silently leaving the > thread by swallowing the exception. If this happens in the production, the > server would get hanged forever and would not be able to deliver its role. > Now its hard for the management tool to detect this. > The idea of this JIRA is to discuss and imprv. > Reference: [Community discussion > thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061397#comment-15061397 ] Rakesh R commented on ZOOKEEPER-1907: - Thank you [~fpj], I will do the necessary changes and prepare a patch there. > Improve Thread handling > --- > > Key: ZOOKEEPER-1907 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, > ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch > > > Server has many critical threads running and co-ordinating each other like > RequestProcessor chains et. When going through each threads, most of them > having the similar structure like: > {code} > public void run() { > try { > while(running) >// processing logic > } > } catch (InterruptedException e) { > LOG.error("Unexpected interruption", e); > } catch (Exception e) { > LOG.error("Unexpected exception", e); > } > LOG.info("...exited loop!"); > } > {code} > From the design I could see, there could be a chance of silently leaving the > thread by swallowing the exception. If this happens in the production, the > server would get hanged forever and would not be able to deliver its role. > Now its hard for the management tool to detect this. > The idea of this JIRA is to discuss and imprv. > Reference: [Community discussion > thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700205#comment-14700205 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Committed to branch-3.4: https://github.com/apache/zookeeper/commit/91f579e40755de870ed9123c8fd55925517d9aa6 Thanks [~rakeshr]! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682305#comment-14682305 ] ASF GitHub Bot commented on ZOOKEEPER-1907: --- GitHub user rakeshadr opened a pull request: https://github.com/apache/zookeeper/pull/39 ZOOKEEPER-1907: Backport thread handling improvement from trunk Signed-off-by: Rakesh rakesh@gmail.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/rakeshadr/zookeeper-1 branch-3.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/39.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #39 commit c9dc95f67cd0b003c3f2b0f07dba90e643bcc8ac Author: Rakesh rakesh@gmail.com Date: 2015-08-11T18:12:12Z ZOOKEEPER-1907: Backport thread handling improvement from trunk Signed-off-by: Rakesh rakesh@gmail.com Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692343#comment-14692343 ] Hongchao Deng commented on ZOOKEEPER-1907: -- +1 I have reviewed the PR and run the unit test locally. It's nice work! Would any other committer have time to review it too? Otw, I will get this in probably by next week. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655093#comment-14655093 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks [~hdeng]. I failed to upload {{branch-3.4}} patch to RB. While uploading the patches it will ask for {{What is the base directory for this diff?}}. It seems {{.}} for trunk patches. Do you have any idea about the value for branch patches? Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658474#comment-14658474 ] Hongchao Deng commented on ZOOKEEPER-1907: -- I used rbtools to upload patches. The web interface has been broken to me for a long time.. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658687#comment-14658687 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- I guess a github branch or pull request works for reviews too right? (we'd of course upload the final patch here, etc) (i usually run into issues with rbtools as well). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658788#comment-14658788 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Yes! That would be great. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654868#comment-14654868 ] Hongchao Deng commented on ZOOKEEPER-1907: -- GJ Rakesh. Do you mind uploading it to ReviewBoard? I would like to give some comments and definitely get this this ASAP. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613032#comment-14613032 ] Rakesh R commented on ZOOKEEPER-1907: - Attached patch to back port this to {{3.4.branch}}. I haven't submitted the patch, it will try patching to the trunk and fail. Please review the changes. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907-br-3-4.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595965#comment-14595965 ] Rakesh R commented on ZOOKEEPER-1907: - As per the [discussion|https://issues.apache.org/jira/browse/ZOOKEEPER-602?focusedCommentId=14547208page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14547208] re-opening this jira to backport the changes to {{branch-3.4}}. I will prepare a patch some time later this week. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352820#comment-14352820 ] Hudson commented on ZOOKEEPER-1907: --- FAILURE: Integrated in ZooKeeper-trunk #2619 (See [https://builds.apache.org/job/ZooKeeper-trunk/2619/]) ZOOKEEPER-1907 Improve Thread handling (Rakesh R via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1665089) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ExitCode.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerListener.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FollowerZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LeaderSessionTracker.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LeaderZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerSessionTracker.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LocalSessionTracker.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ObserverRequestProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ObserverZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyRequestProcessor.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyZooKeeperServer.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/UpgradeableSessionTracker.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/ZooKeeperThreadTest.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/CommitProcessorConcurrencyTest.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/CommitProcessorTest.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ClientBase.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/SessionTrackerCheckTest.java Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352255#comment-14352255 ] Michi Mutsuzaki commented on ZOOKEEPER-1907: +1 Thanks Rakesh! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352527#comment-14352527 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks [~hdeng],[~rgs],[~michim] for the reviews and committing the changes. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349362#comment-14349362 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702875/ZOOKEEPER-1907.patch against trunk revision 1663127. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2543//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2543//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2543//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349503#comment-14349503 ] Hongchao Deng commented on ZOOKEEPER-1907: -- +1 Thanks for your work, [~rakeshr]. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349281#comment-14349281 ] Rakesh R commented on ZOOKEEPER-1907: - Attached new patch addressing the comments given by [~hdeng] in the review board. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349942#comment-14349942 ] Rakesh R commented on ZOOKEEPER-1907: - [~michim] could you have a look at the latest patch when you get some time. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349938#comment-14349938 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks a lot [~hdeng] for shaping the solution and reviews. Note: Test case failure is unrelated to this patch. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343355#comment-14343355 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Hi [~rakeshr], could you update the RB to the latest patch? Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343427#comment-14343427 ] Rakesh R commented on ZOOKEEPER-1907: - I've updated the RB, could you have a look at it. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342334#comment-14342334 ] Rakesh R commented on ZOOKEEPER-1907: - It seems test case failure is unrelated to this patch. bq.I really hope the first running race could be fixed in this issue. A way I can think of would be to change running from bool to int. [~hdeng] I've uploaded new patch which uses {{enum State}} to address this case. Please have a look at the latest patch. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342446#comment-14342446 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Thanks for notifying. I realized RB not showing the latest. Can you update it? I will review it tomorrow (Monday for me). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332103#comment-14332103 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700086/ZOOKEEPER-1907.patch against trunk revision 165. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2526//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2526//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2526//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327976#comment-14327976 ] Hongchao Deng commented on ZOOKEEPER-1907: -- The latest patch looks really good to me, +1! I have a concern which is not related to the patch: In ZooKeeperServer#submitRequest(): {code} if (firstProcessor == null) { synchronized (this) { try { while (!running) { wait(1000); } } catch (InterruptedException e) { LOG.warn(Unexpected interruption, e); } if (firstProcessor == null) { throw new RuntimeException(Not started); } } } {code} I think the purpose of this code is to wait on initial setup. I can see there's a race that after shutdown() running is set to false. And the while loop will go forever. Moreover, submitRequest() doesn't handle the case when server is shut down. It seems fine because it's shut down. I really hope the first running race could be fixed in this issue. A way I can think of would be to change running from bool to int (or enum: STARTING, RUNNING, SHUTDOWN). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328534#comment-14328534 ] Rakesh R commented on ZOOKEEPER-1907: - [~hdeng] Thats really good catch. This case can occur and need to handle it. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326616#comment-14326616 ] Hongchao Deng commented on ZOOKEEPER-1907: -- Hi [~rakeshr]. Added some light comments in RB. I will wait for your next patch fixing the shutdown(), running() stuff and do a further review. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326968#comment-14326968 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699605/ZOOKEEPER-1907.patch against trunk revision 165. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2521//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2521//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2521//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308647#comment-14308647 ] Michi Mutsuzaki commented on ZOOKEEPER-1907: +1 I'll wait for Hongchao's +1 before checking this in. Thanks Rakesh! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307691#comment-14307691 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks [~hdeng]. Attached patch addressing the comments. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307726#comment-14307726 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696797/ZOOKEEPER-1907.patch against trunk revision 1656167. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2513//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2513//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2513//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304984#comment-14304984 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks [~hdeng] for the comments. bq. 1. handleException() in ZK critical Thread seems to be duplicate to the added notify and shut it down function. How can we make this part cleaner? instead of notifying, can call #handleException(thName, exp). I will also remove the duplicate error log messages in the #run() method. Does this sound good to you? bq. 2.What about multiple calls to shutdown()? I think, {{this.running}} flag present in {{ZooKeeperServer}} will help to avoid duplicate calls. Before calling zks#shutdown() will do a check zks#isRunning(), let me try this out. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306564#comment-14306564 ] Rakesh R commented on ZOOKEEPER-1907: - bq. Before calling zks#shutdown() will do a check zks#isRunning(), let me try this out. ZooKeeperServer is setting the flag {{this.running=true}} at the end of the #startup() . Now, there could be a corner case like, ZooKeeperServerListener notification would be skipped if there any fatal notification comes during the server starting up phase. An alternative approach I'm thinking is, add a {{notified}} flag in ZooKeeperServerListenerImpl. Set {{notified=false}} at the beginning of ZooKeeperServer #startup() and just before invoking ZooKeeperServer#shutdown() set {{notified=true}}. Also, add a pre-condition {{if(notified)}} to skip the ZooKeeperServer#shutdown() if already notified. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306601#comment-14306601 ] Hongchao Deng commented on ZOOKEEPER-1907: -- My thought: startup() and shutdown() should be sequential. If not, this is a good chance to fix it. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291863#comment-14291863 ] Rakesh R commented on ZOOKEEPER-1907: - Updated review ticket https://reviews.apache.org/r/20071 with new patch. Appreciate any comments. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264318#comment-14264318 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks for the reply. I will rebase the patch in latest trunk and upload it. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264210#comment-14264210 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666828/ZOOKEEPER-1907.patch against trunk revision 1646992. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2474//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264207#comment-14264207 ] Rakesh R commented on ZOOKEEPER-1907: - [~hdeng], it would be great to see your feedback on my comments. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122936#comment-14122936 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666793/ZOOKEEPER-1907.zip against trunk revision 1621313. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2313//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.zip Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123227#comment-14123227 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666828/ZOOKEEPER-1907.patch against trunk revision 1621313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 2.0.3) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2314//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2314//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123242#comment-14123242 ] Rakesh R commented on ZOOKEEPER-1907: - Thanks [~hdeng] for the comments bq. I didn't like the idea of while-loop polling each thread's live status. I've tried different approach by passing listeners to the critical threads and handing the exception. Please have a look at the patch(yet to add any testcases) bq.Another thing I understand from the code (if correctly) is when a thread died, the entire ZK process is shutdown. If so, what is the difference if just letting the exception go all the way up and shut it down? I am wondering that the original purpose was to try restarting or so. I think restarting the critical resources will make it more complex. Like I mentioned at the beginning there are many critical threads and I'm afraid of inconsistencies. Simple way of handling is, shutdown and leave the things to administrator/monitoring tool, which can restart back after rectifying the cause(For OOME, any functional errors et.). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091305#comment-14091305 ] Hongchao Deng commented on ZOOKEEPER-1907: -- hi [~rakeshr]. The patch should work. However, I didn't like the idea of while-loop polling each thread's live status. Could it use thread-join like notification based mechanism? Another thing I understand from the code (if correctly) is when a thread died, the entire ZK process is shutdown. If so, what is the difference if just letting the exception go all the way up and shut it down? I am wondering that the original purpose was to try restarting or so. I am not familiar with this code. So any correction on my statement is welcome. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.1 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072922#comment-14072922 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657546/ZOOKEEPER-1907.patch against trunk revision 1612906. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2223//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2223//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2223//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072959#comment-14072959 ] Rakesh R commented on ZOOKEEPER-1907: - [~hdeng] Attached latest patch where I've corrected the test case. Could you have a look at this. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072839#comment-14072839 ] Rakesh R commented on ZOOKEEPER-1907: - [~hdeng] [~phunt] sorry for the delay I was busy with other schedules. bq.SessionTrackerImpl.java still contains new line I have checked the latest patch by exec the following command in my unix machine to verify the newline characters. Its not showing any problem for me. {code} #!/usr/bin/env bash if awk '/\r$/{exit 0;} 1{exit 1;}' ZOOKEEPER-1907.patch then echo is DOS fi if [[ $(head -1 ZOOKEEPER-1907.patch) == *$'\r' ]]; then echo DOS; fi {code} bq.Test org.apache.zookeeper.server.quorum.CommitProcessorTest FAILED (crashed) Good catch. I hope the test case is failing due to the following reason. I will fix soon and update a patch. Please ping me any other causes. Thanks! {code} 2014-07-24 10:56:37,543 [myid:] - ERROR [ZKDeathWatcherThread:ZooKeeperCriticalThread@47] - Severe unrecoverable error, from thread : ZKDeathWatcherThread java.lang.NullPointerException at org.apache.zookeeper.server.ZooKeeperServer$ZKDeathWatcher.run(ZooKeeperServer.java:1084) {code} Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.5.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069900#comment-14069900 ] Rakesh R commented on ZOOKEEPER-1907: - Hi [~hdeng], I've uploaded a new patch, please try this. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069914#comment-14069914 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657067/ZOOKEEPER-1907.patch against trunk revision 1612458. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2213//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2213//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2213//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070328#comment-14070328 ] Hongchao Deng commented on ZOOKEEPER-1907: -- [~rakeshr] My jenkins job reported that {noformat} Test org.apache.zookeeper.server.quorum.CommitProcessorTest FAILED (crashed) {noformat} Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070381#comment-14070381 ] Hongchao Deng commented on ZOOKEEPER-1907: -- SessionTrackerImpl.java still contains new line Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069785#comment-14069785 ] Hongchao Deng commented on ZOOKEEPER-1907: -- [~rakeshr] I just downloaded your patch and git warns that the following files end with CR. {code} ./src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java ./src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java ./src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java {code} Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067501#comment-14067501 ] Rakesh R commented on ZOOKEEPER-1907: - [~phunt] there is one open comment which needs to be concluded. Actually this is marked for 3.6.0. IMO if reach to an agreement will include this in 3.5.0 ? [~rgs] sometime back I've replied to your comment in RB. I'd like to know your thoughts. Thanks. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067576#comment-14067576 ] Patrick Hunt commented on ZOOKEEPER-1907: - I think if everyone is ok with the change we could include it in 3.5.0. What do you think? Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067589#comment-14067589 ] Rakesh R commented on ZOOKEEPER-1907: - That would be fine. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067602#comment-14067602 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- I'd very much like to have this for 3.5.0. Let me quickly check the comments [~rakeshr] is referring to. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067603#comment-14067603 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- +1 shipit on the reviewboard — thanks [~rakeshr] and [~phunt]! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067666#comment-14067666 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656748/ZOOKEEPER-1907.patch against trunk revision 1611846. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 87 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2204//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2204//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2204//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066628#comment-14066628 ] Patrick Hunt commented on ZOOKEEPER-1907: - [~rakeshr] and [~rgs] any progress on this? I can't tell from the most recent comment(s) if this is ready or still updates are planned? Please cancel the patch if it's not ready for submission. Thanks. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011071#comment-14011071 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647124/ZOOKEEPER-1907.patch against trunk revision 1596684. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2115//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2115//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2115//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009082#comment-14009082 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- [~rakeshr]: were you planning on updating that RB with (some of) the comments I made? (I see you posted the link to the RB, but didn't see an updated diff). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009203#comment-14009203 ] Rakesh R commented on ZOOKEEPER-1907: - [~rgs], before preparing the patch I would like to discuss about one of your comments and make it clear. I have replied sometime back in RB, could you please have a look at it. Thanks! Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007061#comment-14007061 ] Rakesh R commented on ZOOKEEPER-1907: - Forgot to mention review ticket link : https://reviews.apache.org/r/20071/ Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.6.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963305#comment-13963305 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- (And I forgot to add, this is awesome - thanks [~rakeshr]). Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963304#comment-13963304 ] Raul Gutierrez Segales commented on ZOOKEEPER-1907: --- Commented on the RB. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959243#comment-13959243 ] Michi Mutsuzaki commented on ZOOKEEPER-1907: Thank you for the patch Rakesh. Could you put this on the reviewboard? Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955302#comment-13955302 ] Rakesh R commented on ZOOKEEPER-1907: - I'm attaching an initial proposal where I'm addressing critical threads like: - CommitProcessor - FollowerRequestProcessor - ObserverRequestProcessor - PrepRequestProcessor - ReadOnlyRequestProcessor - SessionTrackerImpl - SyncRequestProcessor Please review the approach and the patch. Thanks in advance. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955310#comment-13955310 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637862/ZOOKEEPER-1907.patch against trunk revision 1583083. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2008//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955321#comment-13955321 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637862/ZOOKEEPER-1907.patch against trunk revision 1583083. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2009//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1907) Improve Thread handling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13956130#comment-13956130 ] Hadoop QA commented on ZOOKEEPER-1907: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638002/ZOOKEEPER-1907.patch against trunk revision 1583513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2016//console This message is automatically generated. Improve Thread handling --- Key: ZOOKEEPER-1907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1907 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.5.0 Attachments: ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch, ZOOKEEPER-1907.patch Server has many critical threads running and co-ordinating each other like RequestProcessor chains et. When going through each threads, most of them having the similar structure like: {code} public void run() { try { while(running) // processing logic } } catch (InterruptedException e) { LOG.error(Unexpected interruption, e); } catch (Exception e) { LOG.error(Unexpected exception, e); } LOG.info(...exited loop!); } {code} From the design I could see, there could be a chance of silently leaving the thread by swallowing the exception. If this happens in the production, the server would get hanged forever and would not be able to deliver its role. Now its hard for the management tool to detect this. The idea of this JIRA is to discuss and imprv. Reference: [Community discussion thread|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201403.mbox/%3cc2496325850aa74c92aaf83aa9662d26458a1...@szxeml561-mbx.china.huawei.com%3E] -- This message was sent by Atlassian JIRA (v6.2#6252)