satishkotha commented on a change in pull request #2048:
URL: https://github.com/apache/hudi/pull/2048#discussion_r488207278



##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -301,6 +304,61 @@ private void deleteAnyLeftOverMarkerFiles(JavaSparkContext 
jsc, HoodieInstant in
     }
   }
 
+  private void deleteReplacedFiles(HoodieInstant instant) {
+    if (!instant.isCompleted()) {
+      // only delete files for completed instants
+      return;
+    }
+
+    TableFileSystemView fileSystemView = this.table.getFileSystemView();
+    ensureReplacedPartitionsLoadedCorrectly(instant, fileSystemView);
+
+    Stream<HoodieFileGroup> fileGroupsToDelete = fileSystemView

Review comment:
       Yes, compaction is the primary reason I only recorded fileId in the 
replace metadata. When deleting, we can get all file paths (through view or by 
listing using consolidated metadata) that have same fileId and delete these 
files.  
   
   There can be race conditions that compaction might create a new file with 
replaced fileId after we queried for existing files though. But because 
FileSystemView#get methods do not include replaced file groups, I think this is 
unlikely to happen. I'm not sure if there are edge cases with long running 
compactions.
   
   Please suggest any other improvements.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to