[
https://issues.apache.org/jira/browse/STORM-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232148#comment-15232148
]
ASF GitHub Bot commented on STORM-956:
--------------------------------------
GitHub user srdo reopened a pull request:
https://github.com/apache/storm/pull/1209
STORM-956: When the execute() or nextTuple() hang on external resources,
stop the Worker's heartbeat
The previous PR at https://github.com/apache/storm/pull/647 doesn't look
active anymore. Having Storm tell you which components are backing up would
still be a nice feature to have.
I've taken a look at implementing the suggestions from the previous PR, but
I have a few questions.
The previous discussion seemed to point toward shutting down the worker
when an executor is hanging. I'm guessing there's no nice way to just restart
the hanging executors? Is it sufficient to call shutdown on the worker object
from do-executor-heartbeats?
I'm not really sure what Constants/SYSTEM_EXECUTOR_ID is for? Should it be
ignored when checking for hanging executors?
I'm hoping to add the zookeeper/metrics logging and shutdown functionality
soon if the idea of this PR is sound.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-956
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1209.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1209
----
commit c0d1c4ef6ae0d1e144f5af85174d68d5a93eb06a
Author: chuanlei <[email protected]>
Date: 2015-07-22T07:37:28Z
stop worker heartbeat, when the executor threads hang-on
commit 16980a3e4e015865348afee7661157cc9a21525a
Author: chuanlei <[email protected]>
Date: 2015-07-22T08:55:39Z
add the setup-check! to mk-threads
commit 9884c578fe8fa85197b1e5d4118598425160bb3f
Author: Stig Døssing <[email protected]>
Date: 2016-03-13T14:57:27Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit 9dd030396b0d921f25c5269e17c58b649387211d
Author: Stig Døssing <[email protected]>
Date: 2016-03-13T18:58:29Z
STORM-956: Add support for warning about hanging executors
commit af0a56df27f6d4765dd868cf85c0633832cd8a72
Author: Stig Døssing <[email protected]>
Date: 2016-03-14T20:45:12Z
STORM-956: Put hang check in its own function, added worker shutdown call,
scheduled hang check interval to match lowest configured timeout.
commit 9bb475213b18d1bbac9277f04dba8381e7a2fa2a
Author: Stig Døssing <[email protected]>
Date: 2016-03-15T19:29:00Z
STORM-956: Log error in Zookeeper when executor is hanging
commit 1a7fd227eade0c205acc1a23aa80ce3e3b845818
Author: Stig Døssing <[email protected]>
Date: 2016-03-18T12:03:21Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit 159a169e2cdf475fb69a3895fc354b9729d0bb6f
Author: Stig Døssing <[email protected]>
Date: 2016-03-20T09:14:46Z
STORM-956: Added support for extending hang timeout via outputcollectors.
Added tests for zk error logging, per-component configuration, disabling hang
checks and hang checks warning and shutting down worker properly.
commit 5000a78fa5b7a43f49e4dbdec7ccf7f87714cb70
Author: Stig Døssing <[email protected]>
Date: 2016-03-21T14:55:49Z
Add comment to Config about disabling hang checking
commit 3396fdc48647a49b1c44d59c3cbe09c098376c4a
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-03-24T21:04:19Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit 76b090746baca87719de5fdbeedbb4a8a7f75aed
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-04-04T13:14:03Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit b6f963387b6ef4d9153481dbd1faf502bbabecf5
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-04-07T07:32:34Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit bb2585f1db71e7523b5a62572b540e4773805a53
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-04-08T10:49:38Z
Merge branch 'master' of https://github.com/apache/storm into STORM-956
commit f717555bf4e3c4c8bf2ba0639694c4486ceb4e73
Author: Stig Rohde Døssing <[email protected]>
Date: 2016-04-08T12:37:46Z
STORM-956: Remove automatic notifyNotHanging from outputcollector methods
----
> When the execute() or nextTuple() hang on external resources, stop the
> Worker's heartbeat
> -----------------------------------------------------------------------------------------
>
> Key: STORM-956
> URL: https://issues.apache.org/jira/browse/STORM-956
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Reporter: Chuanlei Ni
> Assignee: Chuanlei Ni
> Priority: Minor
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> Sometimes the work threads produced by mk-threads in executor.clj hang on
> external resources or other unknown reasons. This makes the workers stop
> processing the tuples. I think it is better to kill this worker to resolve
> the "hang". I plan to :
> 1. like `setup-ticks`, send a system-tick to receive-queue
> 2. the tuple-action-fn deal with this system-tick and remember the time that
> processes this tuple in the executor-data
> 3. when worker do local heartbeat, check the time the executor writes to
> executor-data. If the time is long from current (for example, 3 minutes), the
> worker does not do the heartbeat. So the supervisor could deal with this
> problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)