[ 
https://issues.apache.org/jira/browse/STORM-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192484#comment-15192484
 ] 

ASF GitHub Bot commented on STORM-956:
--------------------------------------

GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/1209

    STORM-956: When the execute() or nextTuple() hang on external resources, 
stop the Worker's heartbeat

    The previous PR at https://github.com/apache/storm/pull/647 doesn't look 
active anymore. Having Storm tell you which components are backing up would 
still be a nice feature to have.
    
    I've taken a look at implementing the suggestions from the previous PR, but 
I have a few questions.
    
    The previous discussion seemed to point toward shutting down the worker 
when an executor is hanging. I'm guessing there's no nice way to just restart 
the hanging executors? Is it sufficient to call shutdown on the worker object 
from do-executor-heartbeats?
    
    I'm not really sure what Constants/SYSTEM_EXECUTOR_ID is for? Should it be 
ignored when checking for hanging executors?
    
    I'm hoping to add the zookeeper/metrics logging and shutdown functionality 
soon if this PR looks like it's going in the right direction.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-956

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1209
    
----
commit c0d1c4ef6ae0d1e144f5af85174d68d5a93eb06a
Author: chuanlei <nichuan...@126.com>
Date:   2015-07-22T07:37:28Z

    stop worker heartbeat, when the executor threads hang-on

commit 16980a3e4e015865348afee7661157cc9a21525a
Author: chuanlei <nichuan...@gmail.com>
Date:   2015-07-22T08:55:39Z

    add the setup-check! to mk-threads

commit 9884c578fe8fa85197b1e5d4118598425160bb3f
Author: Stig Døssing <stigdoess...@gmail.com>
Date:   2016-03-13T14:57:27Z

    Merge branch 'master' of https://github.com/apache/storm into STORM-956

commit 9dd030396b0d921f25c5269e17c58b649387211d
Author: Stig Døssing <stigdoess...@gmail.com>
Date:   2016-03-13T18:58:29Z

    STORM-956: Add support for warning about hanging executors

----


> When the execute() or nextTuple() hang on external resources, stop the 
> Worker's heartbeat
> -----------------------------------------------------------------------------------------
>
>                 Key: STORM-956
>                 URL: https://issues.apache.org/jira/browse/STORM-956
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: Chuanlei Ni
>            Assignee: Chuanlei Ni
>            Priority: Minor
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Sometimes the work threads produced by mk-threads in executor.clj hang on 
> external resources or other unknown reasons. This makes the workers stop 
> processing the tuples.  I think it is better to kill this worker to resolve 
> the "hang". I plan to :
> 1. like `setup-ticks`, send a system-tick to receive-queue
> 2. the tuple-action-fn deal with this system-tick and remember the time that 
> processes this tuple in the executor-data
> 3. when worker do local heartbeat, check the time the executor writes to 
> executor-data. If the time is long from current (for example, 3 minutes), the 
> worker does not do the heartbeat.  So the supervisor could deal with this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to