[
https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852063#comment-15852063
]
ASF GitHub Bot commented on TWILL-181:
--------------------------------------
Github user hsaputra commented on a diff in the pull request:
https://github.com/apache/twill/pull/23#discussion_r99416151
--- Diff:
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java
---
@@ -477,10 +493,30 @@ void handleCompleted(YarnContainerStatus status,
Multiset<String> restartRunnabl
}
}
- private boolean shouldRetry(int exitCode) {
- return exitCode != ContainerExitCodes.SUCCESS
- && exitCode != ContainerExitCodes.DISKS_FAILED
- && exitCode != ContainerExitCodes.INIT_FAILED;
+ private boolean shouldRetry(String runnableName, int instanceId, int
exitCode) {
+ boolean possiblyRetry =
+ exitCode != ContainerExitCodes.SUCCESS &&
+ exitCode != ContainerExitCodes.DISKS_FAILED &&
+ exitCode != ContainerExitCodes.INIT_FAILED;
+
+ if (possiblyRetry) {
+ int max = getMaxRetries(runnableName);
+ if (max == Integer.MAX_VALUE) {
+ return true; // retry without special log msg
+ }
+
+ if (getRetryCount(runnableName, instanceId) == max) {
--- End diff --
Since we call `getRetryCount` for this check, might as well cache it as
local var to make sure we got the right one for this request:
```
int retryCount = getRetryCount(runnableName, instanceId);
if (getRetryCount(runnableName, instanceId) == max) {
...
} else {
LOG.info("Attempting {} of {} retries for instance {} of runnable {}.",
retryCount + 1,
max, instanceId, runnableName);
return true;
}
```
> Control the maximum number of retries for failed application starts
> -------------------------------------------------------------------
>
> Key: TWILL-181
> URL: https://issues.apache.org/jira/browse/TWILL-181
> Project: Apache Twill
> Issue Type: Improvement
> Components: yarn
> Affects Versions: 0.7.0-incubating
> Reporter: Martin Serrano
> Assignee: Martin Serrano
> Fix For: 0.10.0
>
>
> If an application consistently exits with a non-zero code, twill will
> attempt to restart indefinitely. I ran into this issue and a list search
> also reveals [others| http://markmail.org/message/dehx7r6tpqgcmjh4].
> There should be a mechanism to specify the maximum number of retries until
> the application fails. Ideally by default there would be a non-infinite
> maximum.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)