[ 
https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426005#comment-15426005
 ] 

ASF GitHub Bot commented on TWILL-190:
--------------------------------------

Github user poornachandra commented on a diff in the pull request:

    https://github.com/apache/twill/pull/2#discussion_r75257373
  
    --- Diff: 
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java
 ---
    @@ -222,14 +231,17 @@ private void removeInstanceById(String runnableName, 
int instanceId) {
     
         Preconditions.checkState(containerId != null,
                                  "No container found for {} with instanceId = 
{}", runnableName, instanceId);
    +    return controller;
    +  }
     
    +  // This method only stops a runnable using the controller.
    +  // The cleanup of the state happens when handleCompleted() method runs 
for the runnable after the stop
    +  // This method will block until handleCompleted() method runs or a 
timeout occurs
    +  // Hence this method should not be called with a containerLock taken
    +  private void stopInstanceAndWait(String runnableName, 
TwillContainerController controller) {
         LOG.info("Stopping service: {} {}", runnableName, 
controller.getRunId());
    +    // This call will block until handleCompleted() method runs or a 
timeout occurs
         controller.stopAndWait();
    -    containers.remove(runnableName, containerId);
    --- End diff --
    
    Realized that we need to clean up state in case of timeout, so added it 
back.


> Restart of a TwillRunnable does not wait for the runnable to stop
> -----------------------------------------------------------------
>
>                 Key: TWILL-190
>                 URL: https://issues.apache.org/jira/browse/TWILL-190
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core, yarn
>    Affects Versions: 0.6.0-incubating, 0.7.0-incubating
>            Reporter: Poorna Chandra
>            Assignee: Poorna Chandra
>             Fix For: 0.8.0
>
>
> Today when a TwillRunnable is restarted, the call sends a stop message to the 
> TwillRunnable, and then starts new TwillRunnable without waiting for the 
> stopping runnable to finish stopping.
> This can leave a non-responding TwillRunnable container running, and can lead 
> to issues like two TwillRunnables with same instance id running at the same 
> time.
> We should kill the containers that don't respond to stop message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to