[ https://issues.apache.org/jira/browse/SLIDER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004355#comment-15004355 ]
Steve Loughran commented on SLIDER-930: --------------------------------------- If you haven't noticed, we've got the am-suicide RPC call to trigger failures; it's what the test should call > Incorporate Yarn feature of resetting AM failure count into Slider AM > --------------------------------------------------------------------- > > Key: SLIDER-930 > URL: https://issues.apache.org/jira/browse/SLIDER-930 > Project: Slider > Issue Type: Bug > Components: appmaster > Affects Versions: Slider 0.80 > Reporter: Gour Saha > Assignee: Sherry Guo > Fix For: Slider 0.90 > > Attachments: SLIDER-930-001.patch > > > YARN-611 provides this feature. Currently Slider apps are bound by the number > set for yarn.resourcemanager.am.max-retries in the cluster. By default this > value is set to 2, which is very low for long running services. > Slider AM should use the feature provided in YARN-611 and set an interval > after which the failure count will be reset to 0. > I believe the API to call on ApplicationSubmissionContext is > attemptFailuresValidityInterval. To start with Slider can set it to 5 mins > which should be a reasonable default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)