[ 
https://issues.apache.org/jira/browse/TWILL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648030#comment-14648030
 ] 

ASF GitHub Bot commented on TWILL-145:
--------------------------------------

Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/incubator-twill/pull/58#discussion_r35899172
  
    --- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/ApplicationMasterService.java
 ---
    @@ -835,35 +850,57 @@ private boolean handleRestartRunnablesInstances(final 
Message message, final Run
                                                               new 
TypeToken<Set<Integer>>() {}.getType());
     
             LOG.debug("Start restarting runnable {} instances {}", 
runnableName, restartedInstanceIds);
    -        restartRunnableInstances(runnableName, restartedInstanceIds);
    +        restartRunnableInstances(runnableName, restartedInstanceIds, 
completion);
           }
         }
     
    -    completion.run();
         return true;
       }
     
       /**
        * Helper method to restart instances of runnables.
        */
    -  private void restartRunnableInstances(String runnableName, @Nullable 
Set<Integer> instanceIds) {
    -    LOG.debug("Begin restart runnable {} instances.", runnableName);
    +  private void restartRunnableInstances(String runnableName, @Nullable 
Set<Integer> instanceIds,
    +                                        final Runnable completion) {
    +    Runnable restartInstancesRunnable = 
createRestartInstancesRunnable(runnableName, instanceIds, completion);
    +    instanceChangeExecutor.execute(restartInstancesRunnable);
    +  }
     
    -    Set<Integer> instancesToRemove = instanceIds;
    -    if (instancesToRemove == null) {
    -      instancesToRemove = Ranges.closedOpen(0, 
runningContainers.count(runnableName)).asSet(DiscreteDomains.integers());
    -    }
    +  /**
    +   * Creates a Runnable for execution of restart instances request.
    +   */
    +  private Runnable createRestartInstancesRunnable(final String 
runnableName, @Nullable final Set<Integer> instanceIds,
    +                                                  final Runnable 
completion) {
    +    return new Runnable() {
    +      @Override
    +      public void run() {
    +        LOG.debug("Begin restart runnable {} instances.", runnableName);
     
    -    for (int instanceId : instancesToRemove) {
    -      LOG.debug("Remove instance {} for runnable {}", instanceId, 
runnableName);
    -      try {
    -        runningContainers.removeById(runnableName, instanceId);
    -      } catch (Exception ex) {
    -        // could be thrown if the container already stopped.
    -        LOG.info("Exception thrown when stopping instance {} probably 
already stopped.", instanceId);
    +        Set<Integer> instancesToRemove = instanceIds;
    +        if (instancesToRemove == null) {
    +          instancesToRemove =
    +            Ranges.closedOpen(0, 
runningContainers.count(runnableName)).asSet(DiscreteDomains.integers());
    +        }
    +
    +        LOG.info("Restarting instances {} for runnable {}", 
instancesToRemove, runnableName);
    +        RunnableContainerRequest containerRequest =
    +          createRunnableContainerRequest(runnableName, 
instancesToRemove.size(), false);
    +        runnableContainerRequests.add(containerRequest);
    +
    +        for (int instanceId : instancesToRemove) {
    +          LOG.debug("Remove instance {} for runnable {}", instanceId, 
runnableName);
    +          try {
    +            runningContainers.removeById(runnableName, instanceId);
    +          } catch (Exception ex) {
    +            // could be thrown if the container already stopped.
    +            LOG.info("Exception thrown when stopping instance {} probably 
already stopped.", instanceId);
    +          } finally {
    --- End diff --
    
    Should this finally be outside of the for loop? There is only one instance 
of `constainerRequest` and it seems unnecessary to set the ready to true for 
every loop but rather do it just once when the looping finished.


> Potential race condition when restart all is called for a Twill runnable
> ------------------------------------------------------------------------
>
>                 Key: TWILL-145
>                 URL: https://issues.apache.org/jira/browse/TWILL-145
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 0.6.0-incubating
>            Reporter: Henry Saputra
>            Assignee: Henry Saputra
>
> Found this issue from careful eyes of [~chtyim]
> When sending restart instance to all for a particular TwillRunnable, it could 
> have race condition where the heartbeat thread run right after all containers 
> have been released which make the check:
> {code}
>      // Looks for containers requests.
>       if (provisioning.isEmpty() && runnableContainerRequests.isEmpty() && 
> runningContainers.isEmpty()) {
>         LOG.info("All containers completed. Shutting down application 
> master.");
>         break;
>       }
> {code}
> This could happen when all running containers are empty and new 
> runnableContainerRequests has not been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to