danny0405 commented on code in PR #8684:
URL: https://github.com/apache/hudi/pull/8684#discussion_r1223719557


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1052,51 +1091,81 @@ protected HoodieData<HoodieRecord> 
prepRecords(Map<MetadataPartitionType,
   }
 
   /**
-   *  Perform a compaction on the Metadata Table.
-   *
-   * Cases to be handled:
-   *   1. We cannot perform compaction if there are previous inflight 
operations on the dataset. This is because
-   *      a compacted metadata base file at time Tx should represent all the 
actions on the dataset till time Tx.
-   *
-   *   2. In multi-writer scenario, a parallel operation with a greater 
instantTime may have completed creating a
-   *      deltacommit.
+   * Optimize the metadata table by running compaction, clean and archive as 
required.
+   * <p>
+   * Don't perform optimization if there are inflight operations on the 
dataset. This is for two reasons:
+   * - The compaction will contain the correct data as all failed operations 
have been rolled back.
+   * - Clean/compaction etc. will have the highest timestamp on the MDT and we 
won't be adding new operations
+   * with smaller timestamps to metadata table (makes for easier debugging)
+   * <p>
+   * This adds the limitations that long-running async operations (clustering, 
etc.) may cause delay in such MDT
+   * optimizations. We will relax this after MDT code has been hardened.
    */
-  protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String 
instantTime) {
-    // finish off any pending compactions if any from previous attempt.
-    writeClient.runAnyPendingCompactions();
-
-    String latestDeltaCommitTimeInMetadataTable = 
metadataMetaClient.reloadActiveTimeline()
-        .getDeltaCommitTimeline()
-        .filterCompletedInstants()
-        .lastInstant().orElseThrow(() -> new HoodieMetadataException("No 
completed deltacommit in metadata table"))
-        .getTimestamp();
-    // we need to find if there are any inflights in data table timeline 
before or equal to the latest delta commit in metadata table.
-    // Whenever you want to change this logic, please ensure all below 
scenarios are considered.
-    // a. There could be a chance that latest delta commit in MDT is committed 
in MDT, but failed in DT. And so findInstantsBeforeOrEquals() should be employed
-    // b. There could be DT inflights after latest delta commit in MDT and we 
are ok with it. bcoz, the contract is, latest compaction instant time in MDT 
represents
-    // any instants before that is already synced with metadata table.
-    // c. Do consider out of order commits. For eg, c4 from DT could complete 
before c3. and we can't trigger compaction in MDT with c4 as base instant time, 
until every
-    // instant before c4 is synced with metadata table.
-    List<HoodieInstant> pendingInstants = 
dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
-        
.findInstantsBeforeOrEquals(latestDeltaCommitTimeInMetadataTable).getInstants();
+  @Override
+  public void performTableServices(Option<String> inFlightInstantTimestamp) {
+    HoodieTimer metadataTableServicesTimer = HoodieTimer.start();
+    boolean allTableServicesExecutedSuccessfullyOrSkipped = true;
+    try {
+      BaseHoodieWriteClient writeClient = getWriteClient();
+      // Run any pending table services operations.
+      runPendingTableServicesOperations(writeClient);
+
+      // Check and run clean operations.
+      String latestDeltacommitTime = 
metadataMetaClient.reloadActiveTimeline().getDeltaCommitTimeline()
+              .filterCompletedInstants()
+              .lastInstant().get()
+              .getTimestamp();
+      LOG.info("Latest deltacommit time found is " + latestDeltacommitTime + 
", running clean operations.");
+      cleanIfNecessary(writeClient, latestDeltacommitTime);
+
+      // Do timeline validation before scheduling compaction/logcompaction 
operations.
+      if 
(!validateTimelineBeforeSchedulingCompaction(inFlightInstantTimestamp, 
latestDeltacommitTime)) {
+        return;

Review Comment:
   > unless compaction in MDT kicks in, archival might not have anything to do 
after last time it was able to archive something.
   
   Then archiving will always be blocked by the compaction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to