[ https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426005#comment-15426005 ]
ASF GitHub Bot commented on TWILL-190: -------------------------------------- Github user poornachandra commented on a diff in the pull request: https://github.com/apache/twill/pull/2#discussion_r75257373 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/appmaster/RunningContainers.java --- @@ -222,14 +231,17 @@ private void removeInstanceById(String runnableName, int instanceId) { Preconditions.checkState(containerId != null, "No container found for {} with instanceId = {}", runnableName, instanceId); + return controller; + } + // This method only stops a runnable using the controller. + // The cleanup of the state happens when handleCompleted() method runs for the runnable after the stop + // This method will block until handleCompleted() method runs or a timeout occurs + // Hence this method should not be called with a containerLock taken + private void stopInstanceAndWait(String runnableName, TwillContainerController controller) { LOG.info("Stopping service: {} {}", runnableName, controller.getRunId()); + // This call will block until handleCompleted() method runs or a timeout occurs controller.stopAndWait(); - containers.remove(runnableName, containerId); --- End diff -- Realized that we need to clean up state in case of timeout, so added it back. > Restart of a TwillRunnable does not wait for the runnable to stop > ----------------------------------------------------------------- > > Key: TWILL-190 > URL: https://issues.apache.org/jira/browse/TWILL-190 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn > Affects Versions: 0.6.0-incubating, 0.7.0-incubating > Reporter: Poorna Chandra > Assignee: Poorna Chandra > Fix For: 0.8.0 > > > Today when a TwillRunnable is restarted, the call sends a stop message to the > TwillRunnable, and then starts new TwillRunnable without waiting for the > stopping runnable to finish stopping. > This can leave a non-responding TwillRunnable container running, and can lead > to issues like two TwillRunnables with same instance id running at the same > time. > We should kill the containers that don't respond to stop message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)