Github user nilday commented on the issue:
https://github.com/apache/storm/pull/1674
Thanks for all the advises you give. For the suggestions given by @revans2 :
1) We concerned implement blacklist in nimbus before. As a newbie in storm
contribution and clojure, I choose to implement it as a scheduler so I can
write the code in Java and has the minimum affect to the storm core so we can
control the risk. The BlacklistScheduler now uses the DefaultScheduler
underlying, and we can easily edit so code to let it support configuration to
any scheduler. I would like to have a try to add the blacklist to nimbus, as I
can't wait someone else implement it for us.
2)Showing blacklist on UI is good idea.
3)We have the same worry as you do. In the PR I submit this time, we have
some code dealing with it. If the cluster have too many blacklist leading to
lack of slots, the *DefaultBlacklistStrategy* will use
*releaseBlacklistWhenNeeded* method to temporarily resume some supervisors from
blacklist so we can try to assign some job to it. It's not good enough but at
least it's a try. This is definitely a problem, I think there must be a config
which can switch on or off the blacklist feature before it's finally stable
enougth. @knusbaum talked about some heuristic algorithm, which we also had in
our mind before. We think we may use the number of bad slots on one machine and
the number of topoplogies they belong to to calculate the healthiness of a
machine. The idea is not matrue enough so we haven't implement it, and we can
write another IBlacklistStrategy to do that.
As I am a newbie, there may be a lot barriers in front of me. I will be
grateful if I can have your assistance when I face them. Thanks a lot.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---