Github user revans2 commented on the issue:
https://github.com/apache/storm/pull/1674
Overall I like the concept here, and if a supervisor is
appearing/disappearing a lot we probably do want to blacklist that supervisor.
That said I have a few system concerns.
1) I would like to see this feature made a part of nimbus, and not so much
be a scheduler. The algorithm is generic enough that we could easily wrap all
of the schedulers. If you let nimbus hand it the scheduler that it is supposed
to wrap through the constructor, then you can make the internals agnostic to
the scheduler underneath. This would also fix some of the build dependency
issues you where having with needing to build blacklising after building
clojure.
2) I like having the reporting plugin, but I really want to see blacklisted
nodes show up on the storm UI. We have the supervisor pages now, and the
supervisor table on the main page. If I am an administrator I would much
rather look at a UI to see what is happening with a supervisor instead of
parsing a lot of logs.
3) Cluster wide failures. Blacklisting is a good feature until something
odd happens and the entire cluster is blacklisted. (completely theoretical)
Lets say that we have nimbus HA and it is the primary nimbus nodes that gets
lots of network loss. After a while it blacklists the entire cluster, when it
is just nimbus that is bad. I want to be sure that we have something in place
that can detect and handle appropriately a situation where the majority of the
nodes appear to be bad.
4) master. This patch is just for the 1.x branch. That is fine, but
before we can merge it in we need a patch for master as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---