[
https://issues.apache.org/jira/browse/STORM-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265502#comment-15265502
]
Stig Rohde Døssing commented on STORM-1750:
-------------------------------------------
We've seen this issue manifest as storm-kafka spouts crashing and failing to
recover on both 0.10.0 and 1.0.0. In Storm UI it will look like the spout
executors keep running but just stop emitting tuples.
> Report-error-and-die may not kill the worker
> --------------------------------------------
>
> Key: STORM-1750
> URL: https://issues.apache.org/jira/browse/STORM-1750
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.10.0, 1.0.0, 2.0.0
> Reporter: Stig Rohde Døssing
> Assignee: Stig Rohde Døssing
> Priority: Critical
>
> The report-error-and-die function in executor.clj calls report-error, which
> can throw exceptions if Curator runs into any kind of trouble while
> registering the error. I suspect this may happen with network errors, but it
> can also happen if two executors for the same component throw exceptions at
> the same time and no errors have been registered for the component
> previously. This is because both calls to report-error-and-die update the
> lastErrorPath, and ZkStateStorage set_data doesn't catch the potential
> NodeExistsException that may be thrown from the create call.
> If an exception is thrown from report-error, the suicide-fn is never called,
> and the worker keeps running sans the crashed executor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)