[ https://issues.apache.org/jira/browse/SLIDER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007952#comment-15007952 ]
Gour Saha commented on SLIDER-930: ---------------------------------- [~sherryxg] the 002 patch looks great. +1 for it. I made just one minor change to a comment in _TestStandaloneAMRestart.groovy_ in method _testStandaloneAMRestartWithDefaultRetryWindow_ - from - {quote} // kill again & expect the app to still be running {quote} to - {quote} // kill again & expect the app to fail {quote} > Incorporate Yarn feature of resetting AM failure count into Slider AM > --------------------------------------------------------------------- > > Key: SLIDER-930 > URL: https://issues.apache.org/jira/browse/SLIDER-930 > Project: Slider > Issue Type: Bug > Components: appmaster > Affects Versions: Slider 0.80 > Reporter: Gour Saha > Assignee: Sherry Guo > Fix For: Slider 0.90 > > Attachments: SLIDER-930-001.patch, SLIDER-930-002.patch > > > YARN-611 provides this feature. Currently Slider apps are bound by the number > set for yarn.resourcemanager.am.max-retries in the cluster. By default this > value is set to 2, which is very low for long running services. > Slider AM should use the feature provided in YARN-611 and set an interval > after which the failure count will be reset to 0. > I believe the API to call on ApplicationSubmissionContext is > attemptFailuresValidityInterval. To start with Slider can set it to 5 mins > which should be a reasonable default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)