[ https://issues.apache.org/jira/browse/STORM-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192484#comment-15192484 ]
ASF GitHub Bot commented on STORM-956: -------------------------------------- GitHub user srdo opened a pull request: https://github.com/apache/storm/pull/1209 STORM-956: When the execute() or nextTuple() hang on external resources, stop the Worker's heartbeat The previous PR at https://github.com/apache/storm/pull/647 doesn't look active anymore. Having Storm tell you which components are backing up would still be a nice feature to have. I've taken a look at implementing the suggestions from the previous PR, but I have a few questions. The previous discussion seemed to point toward shutting down the worker when an executor is hanging. I'm guessing there's no nice way to just restart the hanging executors? Is it sufficient to call shutdown on the worker object from do-executor-heartbeats? I'm not really sure what Constants/SYSTEM_EXECUTOR_ID is for? Should it be ignored when checking for hanging executors? I'm hoping to add the zookeeper/metrics logging and shutdown functionality soon if this PR looks like it's going in the right direction. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srdo/storm STORM-956 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/1209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1209 ---- commit c0d1c4ef6ae0d1e144f5af85174d68d5a93eb06a Author: chuanlei <nichuan...@126.com> Date: 2015-07-22T07:37:28Z stop worker heartbeat, when the executor threads hang-on commit 16980a3e4e015865348afee7661157cc9a21525a Author: chuanlei <nichuan...@gmail.com> Date: 2015-07-22T08:55:39Z add the setup-check! to mk-threads commit 9884c578fe8fa85197b1e5d4118598425160bb3f Author: Stig Døssing <stigdoess...@gmail.com> Date: 2016-03-13T14:57:27Z Merge branch 'master' of https://github.com/apache/storm into STORM-956 commit 9dd030396b0d921f25c5269e17c58b649387211d Author: Stig Døssing <stigdoess...@gmail.com> Date: 2016-03-13T18:58:29Z STORM-956: Add support for warning about hanging executors ---- > When the execute() or nextTuple() hang on external resources, stop the > Worker's heartbeat > ----------------------------------------------------------------------------------------- > > Key: STORM-956 > URL: https://issues.apache.org/jira/browse/STORM-956 > Project: Apache Storm > Issue Type: Improvement > Components: storm-core > Reporter: Chuanlei Ni > Assignee: Chuanlei Ni > Priority: Minor > Original Estimate: 6h > Remaining Estimate: 6h > > Sometimes the work threads produced by mk-threads in executor.clj hang on > external resources or other unknown reasons. This makes the workers stop > processing the tuples. I think it is better to kill this worker to resolve > the "hang". I plan to : > 1. like `setup-ticks`, send a system-tick to receive-queue > 2. the tuple-action-fn deal with this system-tick and remember the time that > processes this tuple in the executor-data > 3. when worker do local heartbeat, check the time the executor writes to > executor-data. If the time is long from current (for example, 3 minutes), the > worker does not do the heartbeat. So the supervisor could deal with this > problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)