Prasanth Jayachandran created SLIDER-1246:
---------------------------------------------

             Summary: Application health should not be affected by faulty nodes
                 Key: SLIDER-1246
                 URL: https://issues.apache.org/jira/browse/SLIDER-1246
             Project: Slider
          Issue Type: Bug
    Affects Versions: Slider 1.0.0
            Reporter: Prasanth Jayachandran


In case of a faulty node, multiple container failures will be deemed as an 
application failure. 
Observed this in HIVE-16927, where container failures in certain nodes brings 
down entire application. Slider has to provide a way to not mark application as 
unhealthy if certain threshold of containers are running. Tuning failure 
threshold is not optimal as setting the correct default on large cluster is not 
trivial. Beyond certain failures, slider should mark the node as unhealthy and 
report that back to client/AM. Application could continue to run as long as 
container request is satisfied partially (example: 80% containers are running).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to