[ https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840382#comment-15840382 ]
ASF GitHub Bot commented on TWILL-181: -------------------------------------- Github user serranom commented on a diff in the pull request: https://github.com/apache/twill/pull/23#discussion_r98083899 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java --- @@ -113,9 +117,11 @@ public Integer apply(BitSet input) { private final Location applicationLocation; private final Set<String> runnableNames; private final Map<String, Map<String, String>> logLevels; + private final Map<String, Integer> maxRetries; --- End diff -- from my analysis the association of instance ids does not necessarily correspond to specific processes. it looked like there is a pool of requests, a new request being serviced gets the lowest instance id and a failed request gets put back on the queue. this is why i went with an instance adjusted number of retries. did i miss something? > Control the maximum number of retries for failed application starts > ------------------------------------------------------------------- > > Key: TWILL-181 > URL: https://issues.apache.org/jira/browse/TWILL-181 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Affects Versions: 0.7.0-incubating > Reporter: Martin Serrano > Assignee: Martin Serrano > Fix For: 0.10.0 > > > If an application consistently exits with a non-zero code, twill will > attempt to restart indefinitely. I ran into this issue and a list search > also reveals [others| http://markmail.org/message/dehx7r6tpqgcmjh4]. > There should be a mechanism to specify the maximum number of retries until > the application fails. Ideally by default there would be a non-infinite > maximum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)