Github user serranom commented on the issue:

    https://github.com/apache/twill/pull/23
  
    Gotcha.  Here it is, cleaned up for what I actually did:
    
    * Each runnable can have a configured number of max retries.  If not set, 
then retries are unlimited as before.
    * add withMaxTries(runnableName, int) to TwillPreparer
    * add withMaxTries(runnableName, int) to YarnTwillPreparer.  This stores a 
map from runnableName to maxRetries.
    * this map becomes part of the twillRuntimeSpecification and 
RuntimeSpecification interface and is added to TwillRuntimeSpecificationCodec
    * ApplicationMasterService.initRunningContainers is updated to pass a map 
of runnables to maxretries.
    * updated RunningContainers so that it keeps count of the number of retries 
per runnable and uses this in handleCompleted() to determine if it should 
retry. Since every instance is the same as any other, if I'm starting 10 
instances of a Runnable, and wanted a max retry count of 3, then that would 
scale the total number of retries to 30. Each instance gets (on average) 3 
tries. Since the instances are interchangeable, there is no concept of a 
discrete instance being retried.
    * updated logging to not have anything special if max wasn't set and to log 
the number of retries left and when they have been exhausted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to