[
https://issues.apache.org/jira/browse/TWILL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648030#comment-14648030
]
ASF GitHub Bot commented on TWILL-145:
--------------------------------------
Github user chtyim commented on a diff in the pull request:
https://github.com/apache/incubator-twill/pull/58#discussion_r35899172
--- Diff:
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/ApplicationMasterService.java
---
@@ -835,35 +850,57 @@ private boolean handleRestartRunnablesInstances(final
Message message, final Run
new
TypeToken<Set<Integer>>() {}.getType());
LOG.debug("Start restarting runnable {} instances {}",
runnableName, restartedInstanceIds);
- restartRunnableInstances(runnableName, restartedInstanceIds);
+ restartRunnableInstances(runnableName, restartedInstanceIds,
completion);
}
}
- completion.run();
return true;
}
/**
* Helper method to restart instances of runnables.
*/
- private void restartRunnableInstances(String runnableName, @Nullable
Set<Integer> instanceIds) {
- LOG.debug("Begin restart runnable {} instances.", runnableName);
+ private void restartRunnableInstances(String runnableName, @Nullable
Set<Integer> instanceIds,
+ final Runnable completion) {
+ Runnable restartInstancesRunnable =
createRestartInstancesRunnable(runnableName, instanceIds, completion);
+ instanceChangeExecutor.execute(restartInstancesRunnable);
+ }
- Set<Integer> instancesToRemove = instanceIds;
- if (instancesToRemove == null) {
- instancesToRemove = Ranges.closedOpen(0,
runningContainers.count(runnableName)).asSet(DiscreteDomains.integers());
- }
+ /**
+ * Creates a Runnable for execution of restart instances request.
+ */
+ private Runnable createRestartInstancesRunnable(final String
runnableName, @Nullable final Set<Integer> instanceIds,
+ final Runnable
completion) {
+ return new Runnable() {
+ @Override
+ public void run() {
+ LOG.debug("Begin restart runnable {} instances.", runnableName);
- for (int instanceId : instancesToRemove) {
- LOG.debug("Remove instance {} for runnable {}", instanceId,
runnableName);
- try {
- runningContainers.removeById(runnableName, instanceId);
- } catch (Exception ex) {
- // could be thrown if the container already stopped.
- LOG.info("Exception thrown when stopping instance {} probably
already stopped.", instanceId);
+ Set<Integer> instancesToRemove = instanceIds;
+ if (instancesToRemove == null) {
+ instancesToRemove =
+ Ranges.closedOpen(0,
runningContainers.count(runnableName)).asSet(DiscreteDomains.integers());
+ }
+
+ LOG.info("Restarting instances {} for runnable {}",
instancesToRemove, runnableName);
+ RunnableContainerRequest containerRequest =
+ createRunnableContainerRequest(runnableName,
instancesToRemove.size(), false);
+ runnableContainerRequests.add(containerRequest);
+
+ for (int instanceId : instancesToRemove) {
+ LOG.debug("Remove instance {} for runnable {}", instanceId,
runnableName);
+ try {
+ runningContainers.removeById(runnableName, instanceId);
+ } catch (Exception ex) {
+ // could be thrown if the container already stopped.
+ LOG.info("Exception thrown when stopping instance {} probably
already stopped.", instanceId);
+ } finally {
--- End diff --
Should this finally be outside of the for loop? There is only one instance
of `constainerRequest` and it seems unnecessary to set the ready to true for
every loop but rather do it just once when the looping finished.
> Potential race condition when restart all is called for a Twill runnable
> ------------------------------------------------------------------------
>
> Key: TWILL-145
> URL: https://issues.apache.org/jira/browse/TWILL-145
> Project: Apache Twill
> Issue Type: Bug
> Components: yarn
> Affects Versions: 0.6.0-incubating
> Reporter: Henry Saputra
> Assignee: Henry Saputra
>
> Found this issue from careful eyes of [~chtyim]
> When sending restart instance to all for a particular TwillRunnable, it could
> have race condition where the heartbeat thread run right after all containers
> have been released which make the check:
> {code}
> // Looks for containers requests.
> if (provisioning.isEmpty() && runnableContainerRequests.isEmpty() &&
> runningContainers.isEmpty()) {
> LOG.info("All containers completed. Shutting down application
> master.");
> break;
> }
> {code}
> This could happen when all running containers are empty and new
> runnableContainerRequests has not been added.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)