[
https://issues.apache.org/jira/browse/STORM-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268661#comment-15268661
]
ASF GitHub Bot commented on STORM-1750:
---------------------------------------
GitHub user srdo opened a pull request:
https://github.com/apache/storm/pull/1390
STORM-1750 (0.10.x): Ensure worker dies when report-error-and-die is …
…called. Make cluster set-data try setting data if node creation fails
because the node exists.
Backport of STORM-1750
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-1750-0.10.x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1390.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1390
----
commit aaf3abf1132db1f22789bd58875367ac75216e37
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-05-03T13:04:27Z
STORM-1750 (0.10.x): Ensure worker dies when report-error-and-die is
called. Make cluster set-data try setting data if node creation fails because
the node exists.
----
> Report-error-and-die may not kill the worker
> --------------------------------------------
>
> Key: STORM-1750
> URL: https://issues.apache.org/jira/browse/STORM-1750
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.10.0, 1.0.0, 2.0.0
> Reporter: Stig Rohde Døssing
> Assignee: Stig Rohde Døssing
> Priority: Critical
>
> The report-error-and-die function in executor.clj calls report-error, which
> can throw exceptions if Curator runs into any kind of trouble while
> registering the error. I suspect this may happen with network errors, but it
> can also happen if two executors for the same component throw exceptions at
> the same time and no errors have been registered for the component
> previously. This is because both calls to report-error-and-die update the
> lastErrorPath, and ZkStateStorage set_data doesn't catch the potential
> NodeExistsException that may be thrown from the create call.
> If an exception is thrown from report-error, the suicide-fn is never called,
> and the worker keeps running sans the crashed executor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)