[
https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437595#comment-15437595
]
ASF GitHub Bot commented on TWILL-190:
--------------------------------------
Github user chtyim commented on a diff in the pull request:
https://github.com/apache/twill/pull/4#discussion_r76317898
--- Diff:
twill-core/src/main/java/org/apache/twill/internal/TwillContainerLauncher.java
---
@@ -220,5 +250,31 @@ public ContainerLiveNodeData getLiveNodeData() {
public void kill() {
processController.cancel();
}
+
+ private void killAndWait(int maxWaitSecs) {
+ Stopwatch watch = new Stopwatch();
+ watch.start();
+ int tries = 0;
+ while (watch.elapsedTime(TimeUnit.SECONDS) < maxWaitSecs) {
+ // Kill the application
+ try {
+ ++tries;
+ kill();
+ } catch (Exception e) {
+ LOG.error("Exception while killing runnable {}, instance {}",
runnable, instanceId, e);
+ }
+
+ // Wait on the shutdownLatch,
+ // if the runnable has stopped then the latch will be count down
by completed() method
+ if (Uninterruptibles.awaitUninterruptibly(shutdownLatch, 10,
TimeUnit.SECONDS)) {
+ // Runnable has stopped now
+ return;
+ }
+ }
+
+ // Timeout reached, runnable has not stopped
+ LOG.error("Failed to kill runnable {}, instance {} after {} tries",
runnable, instanceId, tries);
--- End diff --
Showing the number of tries is quite artificial since the retry is based on
time. I think it's better to just say failed to kill after n seconds.
> Restart of a TwillRunnable does not wait for the runnable to stop
> -----------------------------------------------------------------
>
> Key: TWILL-190
> URL: https://issues.apache.org/jira/browse/TWILL-190
> Project: Apache Twill
> Issue Type: Bug
> Components: core, yarn
> Affects Versions: 0.6.0-incubating, 0.7.0-incubating
> Reporter: Poorna Chandra
> Assignee: Poorna Chandra
> Fix For: 0.8.0
>
>
> Today when a TwillRunnable is restarted, the call sends a stop message to the
> TwillRunnable, and then starts new TwillRunnable without waiting for the
> stopping runnable to finish stopping.
> This can leave a non-responding TwillRunnable container running, and can lead
> to issues like two TwillRunnables with same instance id running at the same
> time.
> We should kill the containers that don't respond to stop message.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)