[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1046543459 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() Review Comment: this is already the case. The issue we are trying to solve here is, if the replace commit requested is left in the data timeline, then metadata table compaction is stopped. thats why. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1044536141 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() + && config.isRollbackPendingClustering()) { +// rollback any pending clustering + table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant -> { + Option> instantPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant); + if (instantPlan.isPresent()) { +LOG.info("Rolling back pending clustering at instant " + instantPlan.get().getLeft()); +rollback(instant.getTimestamp()); Review Comment: actually, rollback() takes care of ensureing if there is already a rollback targetting the commit of interest, it will re-use the same. we can sync up directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1015087999 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -1335,10 +1335,14 @@ public boolean isClusteringEnabled() { return inlineClusteringEnabled() || isAsyncClusteringEnabled(); } - public boolean isRollbackPendingClustering() { + public boolean isRollbackPendingClusteringOnConflict() { return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING_ON_CONFLICT); } + public boolean isRollbackPendingClustering() { +return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING); + } + Review Comment: the older getter did not have the right name and so took the liberty to fix it. These are writeConfig apis. we are not changing the config key as such, just the getters in the writeConfig. So, I guess we should be good. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() + && config.isRollbackPendingClustering()) { +// rollback any pending clustering + table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant -> { + Option> instantPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant); + if (instantPlan.isPresent()) { +LOG.info("Rolling back pending clustering at instant " + instantPlan.get().getLeft()); +rollback(instant.getTimestamp()); Review Comment: don't we have this issue even today. may be worth fixing it separately for all. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() + && config.isRollbackPendingClustering()) { +// rollback any pending clustering + table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant -> { + Option> instantPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant); + if (instantPlan.isPresent()) { +LOG.info("Rolling back pending clustering at instant " + instantPlan.get().getLeft()); +rollback(instant.getTimestamp()); Review Comment: https://issues.apache.org/jira/browse/HUDI-5169 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -1335,10 +1335,14 @@ public boolean isClusteringEnabled() { return inlineClusteringEnabled() || isAsyncClusteringEnabled(); } - public boolean isRollbackPendingClustering() { + public boolean isRollbackPendingClusteringOnConflict() { return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING_ON_CONFLICT); } + public boolean isRollbackPendingClustering() { +return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING); + } + Review Comment: infact, the older config key is `hoodie.clustering.rollback.pending.replacecommit.on.conflict`, just the getter wasn't properly named. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1010757391 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, HoodieCommitMetadata me metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), "true"); inlineScheduleClustering(extraMetadata); } + + // if clustering is disabled, but we might need to rollback any inflight clustering when clustering was enabled previously. + if (!config.inlineClusteringEnabled() && !config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering() Review Comment: nope. if clustering is enabled, we don't want to rollback pending clustering. could be async clustering running. thats why. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org