[ 
https://issues.apache.org/jira/browse/STORM-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2083:
----------------------------------
    Labels: blacklist pull-request-available scheduling  (was: blacklist 
scheduling)

> Blacklist Scheduler
> -------------------
>
>                 Key: STORM-2083
>                 URL: https://issues.apache.org/jira/browse/STORM-2083
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Howard Lee
>              Labels: blacklist, pull-request-available, scheduling
>          Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> My company has gone through a fault in production, in which a critical switch 
> causes unstable network for a set of machines with package loss rate of 
> 30%-50%. In such fault, the supervisors and workers on the machines are not 
> definitely dead, which is easy to handle. Instead they are still alive but 
> very unstable. They lost heartbeat to the nimbus occasionally. The nimbus, in 
> such circumstance, will still assign jobs to these machines, but will soon 
> find them invalid again, result in a very slow convergence to stable status.
> To deal with such unstable cases, we intend to implement a blacklist 
> scheduler, which will add the unstable nodes (supervisors, slots) to the 
> blacklist temporarily, and resume them later. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to