[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-12-12 Thread GitBox


nsivabalan commented on code in PR #7035:
URL: https://github.com/apache/hudi/pull/7035#discussion_r1046543459


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()

Review Comment:
   this is already the case. The issue we are trying to solve here is, 
   if the replace commit requested is left in the data timeline, then metadata 
table compaction is stopped. 
   thats why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-12-09 Thread GitBox


nsivabalan commented on code in PR #7035:
URL: https://github.com/apache/hudi/pull/7035#discussion_r1044536141


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()
+  && config.isRollbackPendingClustering()) {
+// rollback any pending clustering
+
table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant
 -> {
+  Option> instantPlan = 
ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant);
+  if (instantPlan.isPresent()) {
+LOG.info("Rolling back pending clustering at instant " + 
instantPlan.get().getLeft());
+rollback(instant.getTimestamp());

Review Comment:
   actually, rollback() takes care of ensureing if there is already a rollback 
targetting the commit of interest, it will re-use the same. we can sync up 
directly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-11-06 Thread GitBox


nsivabalan commented on code in PR #7035:
URL: https://github.com/apache/hudi/pull/7035#discussion_r1015087999


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -1335,10 +1335,14 @@ public boolean isClusteringEnabled() {
 return inlineClusteringEnabled() || isAsyncClusteringEnabled();
   }
 
-  public boolean isRollbackPendingClustering() {
+  public boolean isRollbackPendingClusteringOnConflict() {
 return 
getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING_ON_CONFLICT);
   }
 
+  public boolean isRollbackPendingClustering() {
+return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING);
+  }
+

Review Comment:
   the older getter did not have the right name and so took the liberty to fix 
it. These are writeConfig apis. we are not changing the config key as such, 
just the getters in the writeConfig. So, I guess we should be good. 
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()
+  && config.isRollbackPendingClustering()) {
+// rollback any pending clustering
+
table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant
 -> {
+  Option> instantPlan = 
ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant);
+  if (instantPlan.isPresent()) {
+LOG.info("Rolling back pending clustering at instant " + 
instantPlan.get().getLeft());
+rollback(instant.getTimestamp());

Review Comment:
   don't we have this issue even today. may be worth fixing it separately for 
all.
   



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()
+  && config.isRollbackPendingClustering()) {
+// rollback any pending clustering
+
table.getActiveTimeline().filterPendingReplaceTimeline().getInstants().forEach(instant
 -> {
+  Option> instantPlan = 
ClusteringUtils.getClusteringPlan(table.getMetaClient(), instant);
+  if (instantPlan.isPresent()) {
+LOG.info("Rolling back pending clustering at instant " + 
instantPlan.get().getLeft());
+rollback(instant.getTimestamp());

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-5169



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -1335,10 +1335,14 @@ public boolean isClusteringEnabled() {
 return inlineClusteringEnabled() || isAsyncClusteringEnabled();
   }
 
-  public boolean isRollbackPendingClustering() {
+  public boolean isRollbackPendingClusteringOnConflict() {
 return 
getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING_ON_CONFLICT);
   }
 
+  public boolean isRollbackPendingClustering() {
+return getBoolean(HoodieClusteringConfig.ROLLBACK_PENDING_CLUSTERING);
+  }
+

Review Comment:
   infact, the older config key is 
`hoodie.clustering.rollback.pending.replacecommit.on.conflict`, just the getter 
wasn't properly named.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-11-01 Thread GitBox


nsivabalan commented on code in PR #7035:
URL: https://github.com/apache/hudi/pull/7035#discussion_r1010757391


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -588,6 +588,19 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
 
metadata.addMetadata(HoodieClusteringConfig.SCHEDULE_INLINE_CLUSTERING.key(), 
"true");
 inlineScheduleClustering(extraMetadata);
   }
+
+  // if clustering is disabled, but we might need to rollback any inflight 
clustering when clustering was enabled previously.
+  if (!config.inlineClusteringEnabled() && 
!config.isAsyncClusteringEnabled() && !config.scheduleInlineClustering()

Review Comment:
   nope. if clustering is enabled, we don't want to rollback pending 
clustering. could be async clustering running. thats why. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org