GitHub user zhuoliu opened a pull request:
https://github.com/apache/storm/pull/1338
[STORM-1696] status not sync if zk fails in backpressure #1320
This is a fix for master branch.
When there is a zk exception happens during worker-backpressure!,
there is a bad state which can block the topology from running normally
any more.
The root cause: in worker/mk-backpressure-handler
if the worker-backpressure! fails once due to zk connection exception,
next time when this method gets called by WordBackpressureThread, because
(when (not= prev-backpressure-flag curr-backpressure-flag) will never be true,
the remote zk node can not be synced with local state.
This problem can cause a topology to be blocked!
This also explains why we will not see any problem when testing in a stable
(zk never fail) environment.
Solution is quite straightforward: first change the zk status, if succeeds,
change local status.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhuoliu/storm 1696b
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1338.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1338
----
commit a70195d717e1ed179425a0305b2e4279afeb6118
Author: zhuoliu <[email protected]>
Date: 2016-04-13T18:04:57Z
[STORM-1696] status not sync if zk fails in backpressure
commit 63518be11275104314dcebbf83bb1f8a513891ee
Author: zhuoliu <[email protected]>
Date: 2016-04-13T18:08:19Z
minor
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---