[ 
https://issues.apache.org/jira/browse/STORM-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268575#comment-15268575
 ] 

ASF GitHub Bot commented on STORM-1750:
---------------------------------------

GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/1389

    STORM-1750: Ensure worker dies when report-error-and-die is called. M…

    …ake zookeeper_state_factory set-data try setting data if node creation 
fails because the node exists.
    
    Backport of STORM-1750

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-1750-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1389
    
----
commit 6e4bdbc8cacd7621c570126a5d9e8da9f599d037
Author: Stig Rohde Døssing <[email protected]>
Date:   2016-05-03T11:51:13Z

    STORM-1750: Ensure worker dies when report-error-and-die is called. Make 
zookeeper_state_factory set-data try setting data if node creation fails 
because the node exists

----


> Report-error-and-die may not kill the worker
> --------------------------------------------
>
>                 Key: STORM-1750
>                 URL: https://issues.apache.org/jira/browse/STORM-1750
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.10.0, 1.0.0, 2.0.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Critical
>
> The report-error-and-die function in executor.clj calls report-error, which 
> can throw exceptions if Curator runs into any kind of trouble while 
> registering the error. I suspect this may happen with network errors, but it 
> can also happen if two executors for the same component throw exceptions at 
> the same time and no errors have been registered for the component 
> previously. This is because both calls to report-error-and-die update the 
> lastErrorPath, and ZkStateStorage set_data doesn't catch the potential 
> NodeExistsException that may be thrown from the create call.
> If an exception is thrown from report-error, the suicide-fn is never called, 
> and the worker keeps running sans the crashed executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to