umustafi commented on code in PR #3602: URL: https://github.com/apache/gobblin/pull/3602#discussion_r1026979650
########## gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java: ########## @@ -544,23 +544,25 @@ void connectHelixManagerWithRetry() { private void addInstanceTags() { HelixManager receiverManager = getReceiverManager(); if (receiverManager.isConnected()) { - // The helix instance associated with this container should be consistent on helix tag - List<String> existedTags = receiverManager.getClusterManagmentTool() - .getInstanceConfig(this.clusterName, this.helixInstanceName).getTags(); - Set<String> desiredTags = new HashSet<>( - ConfigUtils.getStringList(this.clusterConfig, GobblinClusterConfigurationKeys.HELIX_INSTANCE_TAGS_KEY)); - if (!desiredTags.isEmpty()) { - // Remove tag assignments for the current Helix instance from a previous run - for (String tag : existedTags) { - if (!desiredTags.contains(tag)) - receiverManager.getClusterManagmentTool().removeInstanceTag(this.clusterName, this.helixInstanceName, tag); - logger.info("Removed unrelated helix tag {} for instance {}", tag, this.helixInstanceName); + try { + Set<String> desiredTags = new HashSet<>(ConfigUtils.getStringList(this.clusterConfig, GobblinClusterConfigurationKeys.HELIX_INSTANCE_TAGS_KEY)); + if (!desiredTags.isEmpty()) { + // The helix instance associated with this container should be consistent on helix tag + List<String> existedTags = + receiverManager.getClusterManagmentTool().getInstanceConfig(this.clusterName, this.helixInstanceName).getTags(); Review Comment: I believe there are some yarn related assumptions (on container start registering the instance config or something) and basically we are not setting that up for all use cases. I saw a `setInstanceConfig` method in `ZKHelixAdmin` that should be called explicitly in the non-Yarn case perhaps if Helix is not automatically creating one if taskrunner joins. From testing the code, previously we never entered and tested this code block since `desiredTags` is empty for our use case. This shouldn't change the streaming logic because the `existedTags` definition is only used inside the code block so I've moved the declaration to where it's used and caught the exception that can be thrown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org