[ 
https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437601#comment-15437601
 ] 

ASF GitHub Bot commented on TWILL-190:
--------------------------------------

Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/twill/pull/4#discussion_r76318343
  
    --- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/ApplicationMasterService.java
 ---
    @@ -268,20 +270,33 @@ public void acquired(List<? extends 
ProcessLauncher<YarnContainerInfo>> launcher
           @Override
           public void completed(List<YarnContainerStatus> completed) {
             for (YarnContainerStatus status : completed) {
    +          handleCompleted(completed);
               ids.remove(status.getContainerId());
             }
           }
         };
     
    -    runningContainers.stopAll();
    -
    -    // Poll for 5 seconds to wait for containers to stop.
    -    int count = 0;
    -    while (!ids.isEmpty() && count++ < 5) {
    -      amClient.allocate(0.0f, handler);
    -      TimeUnit.SECONDS.sleep(1);
    -    }
    +    // Handle heartbeats during shutdown because 
runningContainers.stopAll() waits until
    +    // handleCompleted() is called for every stopped runnable
    +    ExecutorService stopPoller = 
Executors.newSingleThreadExecutor(Threads.createDaemonThreadFactory("stopPoller"));
    +    stopPoller.execute(new Runnable() {
    +      @Override
    +      public void run() {
    +        while (!ids.isEmpty()) {
    +          try {
    +            amClient.allocate(0.0f, handler);
    +            TimeUnit.SECONDS.sleep(1);
    --- End diff --
    
    Should check if `ids` is already emptied before sleeping, since the call 
the `allocate` may already have the ids emptied by the handler and we don't 
have the sleep for an extra second for that.


> Restart of a TwillRunnable does not wait for the runnable to stop
> -----------------------------------------------------------------
>
>                 Key: TWILL-190
>                 URL: https://issues.apache.org/jira/browse/TWILL-190
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core, yarn
>    Affects Versions: 0.6.0-incubating, 0.7.0-incubating
>            Reporter: Poorna Chandra
>            Assignee: Poorna Chandra
>             Fix For: 0.8.0
>
>
> Today when a TwillRunnable is restarted, the call sends a stop message to the 
> TwillRunnable, and then starts new TwillRunnable without waiting for the 
> stopping runnable to finish stopping.
> This can leave a non-responding TwillRunnable container running, and can lead 
> to issues like two TwillRunnables with same instance id running at the same 
> time.
> We should kill the containers that don't respond to stop message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to