[ https://issues.apache.org/jira/browse/SLIDER-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187666#comment-16187666 ]
ASF subversion and git services commented on SLIDER-1246: --------------------------------------------------------- Commit 0f436c865a90aba5b427d1c0571183c6fcbded1e in incubator-slider's branch refs/heads/develop from [~gsaha] [ https://git-wip-us.apache.org/repos/asf?p=incubator-slider.git;h=0f436c8 ] SLIDER-1246 Application health should not be affected by faulty nodes (health monitor based on percent threshold) > Application health should not be affected by faulty nodes > --------------------------------------------------------- > > Key: SLIDER-1246 > URL: https://issues.apache.org/jira/browse/SLIDER-1246 > Project: Slider > Issue Type: Bug > Components: appmaster, core > Affects Versions: Slider 0.92 > Reporter: Prasanth Jayachandran > Assignee: Gour Saha > Fix For: Slider 1.0.0 > > Attachments: SLIDER-1246.01.patch, SLIDER-1246.02.patch, > SLIDER-1246.03.patch, SLIDER-1246.04.patch > > > In case of a faulty node, multiple container failures will be deemed as an > application failure. > Observed this in HIVE-16927, where container failures in certain nodes brings > down entire application. Slider has to provide a way to not mark application > as unhealthy if certain threshold of containers are running. Tuning failure > threshold is not optimal as setting the correct default on large cluster is > not trivial. Beyond certain failures, slider should mark the node as > unhealthy and report that back to client/AM. Application could continue to > run as long as container request is satisfied partially (example: 80% > containers are running). -- This message was sent by Atlassian JIRA (v6.4.14#64029)