prashantwason commented on a change in pull request #4821:
URL: https://github.com/apache/hudi/pull/4821#discussion_r812666713



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
##########
@@ -429,25 +428,21 @@ public void mergeArchiveFiles(List<FileStatus> 
compactCandidate) throws IOExcept
         .collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
             HoodieInstant.getComparableAction(i.getAction()))));
 
-    // If metadata table is enabled, do not archive instants which are more 
recent than the last compaction on the
-    // metadata table.
-    if (config.isMetadataTableEnabled()) {
-      try (HoodieTableMetadata tableMetadata = 
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
-          config.getBasePath(), 
FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
-        Option<String> latestCompactionTime = 
tableMetadata.getLatestCompactionTime();

Review comment:
       There can be the case where deltacommit on the metadata table succeeds 
but commit on dataset fails (inflight -> complete transition fails). 
   
   In this case, we should not be using the deltacommit on the metadata table 
and to check that we need commits on the dataset to not be archived. If you 
archive the commits on the datasets indeendantly then we cannot differentiate 
between the error case above and the case where commit is missing from dataset 
timeline because it was archived.
   
   
   

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
##########
@@ -429,25 +428,21 @@ public void mergeArchiveFiles(List<FileStatus> 
compactCandidate) throws IOExcept
         .collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
             HoodieInstant.getComparableAction(i.getAction()))));
 
-    // If metadata table is enabled, do not archive instants which are more 
recent than the last compaction on the
-    // metadata table.
-    if (config.isMetadataTableEnabled()) {
-      try (HoodieTableMetadata tableMetadata = 
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
-          config.getBasePath(), 
FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
-        Option<String> latestCompactionTime = 
tableMetadata.getLatestCompactionTime();
-        if (!latestCompactionTime.isPresent()) {
-          LOG.info("Not archiving as there is no compaction yet on the 
metadata table");
-          instants = Stream.empty();
-        } else {
-          LOG.info("Limiting archiving of instants to latest compaction on 
metadata table at " + latestCompactionTime.get());
-          instants = instants.filter(instant -> 
HoodieTimeline.compareTimestamps(instant.getTimestamp(), 
HoodieTimeline.LESSER_THAN,
-              latestCompactionTime.get()));
-        }
-      } catch (Exception e) {
-        throw new HoodieException("Error limiting instant archival based on 
metadata table", e);
+    // If this is a metadata table, do not archive the commits that live in 
data set
+    // active timeline. This is required by metadata table,
+    // see HoodieTableMetadataUtil#processRollbackMetadata for details.
+    if (HoodieTableMetadata.isMetadataTable(config.getBasePath())) {

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to