[jira] [Commented] (SLIDER-1233) Lost nodes should not contribute to container failures

Gour Saha (JIRA) Mon, 24 Jul 2017 14:36:30 -0700

    [ 
https://issues.apache.org/jira/browse/SLIDER-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099160#comment-16099160
 ]


Gour Saha commented on SLIDER-1233:
-----------------------------------

+1. The patch looks good. I successfully ran the new tests and also verified 
that they fail with the absence of the src code patch.

Only one minor comment on the below src code comment -
bq. // don't increment failure counts if container was aborted or killed
I think we should not stop at _killed_, since there are other killed scenarios 
like pmem and vmem which we are treating differently. Should we explicitly 
specify killed for which scenarios, or maybe just shorten the comment to "// 
don't increment failure counts" and folks can refer to ContainerOutcome.class 
to see which scenarios count as Completed? What do you think?

I am running all the UTs now and will report back if I see any failures.

> Lost nodes should not contribute to container failures
> ------------------------------------------------------
>
>                 Key: SLIDER-1233
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1233
>             Project: Slider
>          Issue Type: Bug
>          Components: core
>            Reporter: Billie Rinaldi
>            Assignee: Billie Rinaldi
>             Fix For: Slider 1.0.0
>
>         Attachments: SLIDER-1233.001.patch
>
>
> If a container completes due to an NM being lost, we should not count this 
> towards container failures that may eventually cause the AM to fail the 
> application. We are already using a ContainerOutcome of Completed (rather 
> than Failed) for this type of container exit, so we just need to change the 
> failure counting in that case. Other failure types associated with Completed 
> are killed by the AM, killed by the RM, and killed after app completion, none 
> of which need to contribute to container failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (SLIDER-1233) Lost nodes should not contribute to container failures

Reply via email to