[
https://issues.apache.org/jira/browse/TWILL-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964197#comment-14964197
]
ASF GitHub Bot commented on TWILL-131:
--------------------------------------
Github user chtyim commented on a diff in the pull request:
https://github.com/apache/incubator-twill/pull/70#discussion_r42435827
--- Diff:
twill-yarn/src/main/java/org/apache/twill/internal/appmaster/ApplicationMasterMain.java
---
@@ -229,4 +234,82 @@ protected void shutDown() throws Exception {
}
}
}
+
+ private static final class AppMasterTwillZKPathService extends
TwillZKPathService {
+
+ private static final Logger LOG =
LoggerFactory.getLogger(AppMasterTwillZKPathService.class);
+ private final ZKClient zkClient;
+
+ public AppMasterTwillZKPathService(ZKClient zkClient, RunId runId) {
+ super(zkClient, runId);
+ this.zkClient = zkClient;
+ }
+
+ @Override
+ protected void shutDown() throws Exception {
+ super.shutDown();
+
+ // Deletes ZK nodes created for the application execution
+ // We don't have to worry about race condition when another instance
of the same app starts at the same time
+ // when removal is performed because we always create node with
"createParent == true", which will take care of
+ // the the parent node recreation if it is getting removed from here.
+
+ // Try to delete the /instances path. It may throws
NotEmptyException if there are other instances of the
+ // same app running, which can safely ignore and return.
+ if (!delete(Constants.INSTANCES_PATH_PREFIX)) {
+ return;
+ }
+
+ // Try to delete children under /discovery. It may fail with
NotEmptyException if there are other instances
+ // of the same app running that has discovery services running.
+ List<String> children =
zkClient.getChildren(Constants.DISCOVERY_PATH_PREFIX)
+ .get(TIMEOUT_SECONDS,
TimeUnit.SECONDS).getChildren();
+ List<OperationFuture<?>> deleteFutures = new ArrayList<>();
+ for (String child : children) {
+ String path = Constants.DISCOVERY_PATH_PREFIX + "/" + child;
+ LOG.info("Removing ZK path: {}{}", zkClient.getConnectString(),
path);
+ deleteFutures.add(zkClient.delete(path));
+ }
+ Futures.successfulAsList(deleteFutures).get(TIMEOUT_SECONDS,
TimeUnit.SECONDS);
+ for (OperationFuture<?> future : deleteFutures) {
+ try {
+ future.get();
+ } catch (ExecutionException e) {
+ if (e.getCause() instanceof KeeperException.NotEmptyException) {
+ return;
+ }
+ throw e;
+ }
+ }
+
+ // Delete the /discovery. It may fail with NotEmptyException (due to
race between apps),
+ // which can safely ignore and return.
+ if (!delete(Constants.DISCOVERY_PATH_PREFIX)) {
+ return;
+ }
+
+ // Delete the ZK path for the app namespace.
+ delete("/");
+ }
+
+ /**
+ * Deletes the given ZK path.
+ *
+ * @param path path to delete
+ * @return true if the path is delete, false if failed to delete due
to {@link KeeperException.NotEmptyException}.
+ * @throws Exception if failed to delete
+ */
+ private boolean delete(String path) throws Exception {
+ try {
+ LOG.info("Removing ZK path: {}{}", zkClient.getConnectString(),
path);
--- End diff --
All ZK node operation are logged as info.
> Zookeepers nodes are not removed
> --------------------------------
>
> Key: TWILL-131
> URL: https://issues.apache.org/jira/browse/TWILL-131
> Project: Apache Twill
> Issue Type: Bug
> Components: discovery, zookeeper
> Affects Versions: 0.5.0-incubating
> Reporter: Colin B.
> Assignee: Alvin Wang
> Fix For: 0.7.0-incubating
>
>
> When a TwillRunnable is run with the YarnTwillRunnerService, a zookeeper node
> is created and never removed.
> For example run the example HelloWorld application:
> {code}
> java -cp $CP org.apache.twill.example.yarn.HelloWorld localhost:2181/hello
> {code}
> After the application had run to completion I looked at zookeeper and found:
> {code}
> > ./zkCli.sh ls /hello
> ...
> [HelloWorldRunnable]
> {code}
> However I expected:
> {code}
> > ./zkCli.sh ls /hello
> ...
> []
> {code}
> This becomes an issue when a service creates a large number of
> TwillApplications with unique names.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)