[ https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428529#comment-15428529 ]
ASF GitHub Bot commented on TWILL-190: -------------------------------------- Github user poornachandra commented on a diff in the pull request: https://github.com/apache/twill/pull/2#discussion_r75523615 --- Diff: twill-core/src/main/java/org/apache/twill/internal/TwillContainerLauncher.java --- @@ -177,7 +187,22 @@ protected void doStartUp() { @Override protected void doShutDown() { - // No-op + // Wait for sometime for the container to stop + // TODO: Use configurable value for stop time + int maxWaitSecs = Constants.APPLICATION_MAX_STOP_SECONDS; + try { + if (shutdownLatch.await(maxWaitSecs, TimeUnit.SECONDS)) { + return; + } + } catch (InterruptedException e) { + LOG.error("Got exception while waiting for runnable {}, instance {} to stop", runnable, instanceId); + // TODO: how do we handle the InterruptedException? Should we restore the interrupted status? + return; + } + // Container has not shutdown even after maxWaitSecs after sending stop message, + // we'll need to kill the container + LOG.warn("Killing runnable {}, instance {} after waiting {} secs", runnable, instanceId, maxWaitSecs); + kill(); --- End diff -- In case of exception during a kill, I'll add code to retry the kill a few times. > Restart of a TwillRunnable does not wait for the runnable to stop > ----------------------------------------------------------------- > > Key: TWILL-190 > URL: https://issues.apache.org/jira/browse/TWILL-190 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn > Affects Versions: 0.6.0-incubating, 0.7.0-incubating > Reporter: Poorna Chandra > Assignee: Poorna Chandra > Fix For: 0.8.0 > > > Today when a TwillRunnable is restarted, the call sends a stop message to the > TwillRunnable, and then starts new TwillRunnable without waiting for the > stopping runnable to finish stopping. > This can leave a non-responding TwillRunnable container running, and can lead > to issues like two TwillRunnables with same instance id running at the same > time. > We should kill the containers that don't respond to stop message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)