[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

via GitHub Thu, 04 May 2023 18:44:58 -0700


zhuanshenbsj1 commented on code in PR #8505:
URL: https://github.com/apache/hudi/pull/8505#discussion_r1185642315



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java:
##########
@@ -256,7 +279,16 @@ private int doScheduleAndCluster(JavaSparkContext jsc) 
throws Exception {
       LOG.info("The schedule instant time is " + instantTime.get());
       LOG.info("Step 2: Do cluster");
       Option<HoodieCommitMetadata> metadata = 
client.cluster(instantTime.get()).getCommitMetadata();
+      cleanAfterCluster(client);
       return UtilHelpers.handleErrors(metadata.get(), instantTime.get());
     }
   }
+
+  private void cleanAfterCluster(SparkRDDWriteClient client) {
+    client.waitForAsyncServiceCompletion();
+    if (client.getConfig().isAutoClean() && 
!client.getConfig().isAsyncClean()) {

Review Comment:
   > I think we need to trigge a sync clean if it is enabled.
   
   IF isAsyncClean is enable, spark offline job will start an async-cleaning in 
prewrite like flink job. So if isAsyncClean is disable then add a synchronous 
cleanup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

Reply via email to