[ https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15844111#comment-15844111 ]
ASF GitHub Bot commented on TWILL-181: -------------------------------------- Github user serranom commented on a diff in the pull request: https://github.com/apache/twill/pull/23#discussion_r98334088 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java --- @@ -113,9 +117,11 @@ public Integer apply(BitSet input) { private final Location applicationLocation; private final Set<String> runnableNames; private final Map<String, Map<String, String>> logLevels; + private final Map<String, Integer> maxRetries; --- End diff -- @poornachandra, I don't see anything linking one instance request to a retry of that failure. Failures get added to the multiset which then results in a new request for that many instances. This means in the end, there is nothing tying one attempt for a particular instanceId to the next one. So we could keep track of retries by instanceId, but it would end up functionally being the same as keeping track of retries for the entire Runnable. I think it would just add more complexity for little benefit to track by instanceId. > Control the maximum number of retries for failed application starts > ------------------------------------------------------------------- > > Key: TWILL-181 > URL: https://issues.apache.org/jira/browse/TWILL-181 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Affects Versions: 0.7.0-incubating > Reporter: Martin Serrano > Assignee: Martin Serrano > Fix For: 0.10.0 > > > If an application consistently exits with a non-zero code, twill will > attempt to restart indefinitely. I ran into this issue and a list search > also reveals [others| http://markmail.org/message/dehx7r6tpqgcmjh4]. > There should be a mechanism to specify the maximum number of retries until > the application fails. Ideally by default there would be a non-infinite > maximum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)