garyli1019 commented on a change in pull request #3226:
URL: https://github.com/apache/hudi/pull/3226#discussion_r664203627



##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/FlinkCompactionConfig.java
##########
@@ -89,6 +89,9 @@
   @Parameter(names = {"--compaction-tasks"}, description = "Parallelism of 
tasks that do actual compaction, default is -1", required = false)
   public Integer compactionTasks = -1;
 
+  @Parameter(names = {"--schedule", "-sc"}, description = "Schedule 
compaction", required = false)

Review comment:
       Should we add some comments that async scheduling is not recommended and 
the possible risk? Most users won't dig this deep

##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
##########
@@ -75,15 +75,28 @@ public static void main(String[] args) throws Exception {
 
     // judge whether have operation
     // to compute the compaction instant time and do compaction.
-    String compactionInstantTime = 
CompactionUtil.getCompactionInstantTime(metaClient);
-    boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
-    if (!scheduled) {
+    if (cfg.schedule) {
+      String compactionInstantTime = 
CompactionUtil.getCompactionInstantTime(metaClient);
+      boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
+      if (!scheduled) {
+        // do nothing.
+        LOG.info("No compaction plan for this job ");
+        return;
+      }
+    }
+
+    table.getMetaClient().reloadActiveTimeline();
+
+    // the last instant takes the highest priority
+    Option<HoodieInstant> lastRequested = 
table.getActiveTimeline().filterPendingCompactionTimeline()

Review comment:
       should we also make this configurable? First in first out or last in 
first out.

##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
##########
@@ -75,15 +75,29 @@ public static void main(String[] args) throws Exception {
 
     // judge whether have operation
     // to compute the compaction instant time and do compaction.
-    String compactionInstantTime = 
CompactionUtil.getCompactionInstantTime(metaClient);
-    boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
-    if (!scheduled) {
+    if (cfg.schedule) {
+      String compactionInstantTime = 
CompactionUtil.getCompactionInstantTime(metaClient);
+      boolean scheduled = 
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
+      if (!scheduled) {
+        // do nothing.
+        LOG.info("No compaction plan for this job ");
+        return;
+      }
+    }
+
+    table.getMetaClient().reloadActiveTimeline();
+
+    // the last instant takes the highest priority

Review comment:
       outdated comment

##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/FlinkCompactionConfig.java
##########
@@ -89,12 +89,23 @@
   @Parameter(names = {"--compaction-tasks"}, description = "Parallelism of 
tasks that do actual compaction, default is -1", required = false)
   public Integer compactionTasks = -1;
 
+  @Parameter(names = {"--schedule", "-sc"}, description = "Schedule the 
compaction plan, there is risk to lost data when this option turns on,\n"

Review comment:
       Not recommended. Schedule the compaction plan in this job. There is a 
risk of losing data when scheduling compaction outside the writer job. 
Scheduling compaction in the writer job and only let this job do the compaction 
execution is recommended. Default is false.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to