[ https://issues.apache.org/jira/browse/SLIDER-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099160#comment-16099160 ]
Gour Saha commented on SLIDER-1233: ----------------------------------- +1. The patch looks good. I successfully ran the new tests and also verified that they fail with the absence of the src code patch. Only one minor comment on the below src code comment - bq. // don't increment failure counts if container was aborted or killed I think we should not stop at _killed_, since there are other killed scenarios like pmem and vmem which we are treating differently. Should we explicitly specify killed for which scenarios, or maybe just shorten the comment to "// don't increment failure counts" and folks can refer to ContainerOutcome.class to see which scenarios count as Completed? What do you think? I am running all the UTs now and will report back if I see any failures. > Lost nodes should not contribute to container failures > ------------------------------------------------------ > > Key: SLIDER-1233 > URL: https://issues.apache.org/jira/browse/SLIDER-1233 > Project: Slider > Issue Type: Bug > Components: core > Reporter: Billie Rinaldi > Assignee: Billie Rinaldi > Fix For: Slider 1.0.0 > > Attachments: SLIDER-1233.001.patch > > > If a container completes due to an NM being lost, we should not count this > towards container failures that may eventually cause the AM to fail the > application. We are already using a ContainerOutcome of Completed (rather > than Failed) for this type of container exit, so we just need to change the > failure counting in that case. Other failure types associated with Completed > are killed by the AM, killed by the RM, and killed after app completion, none > of which need to contribute to container failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)