satishkotha commented on a change in pull request #2048:
URL: https://github.com/apache/hudi/pull/2048#discussion_r490491745



##########
File path: 
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -301,6 +304,61 @@ private void deleteAnyLeftOverMarkerFiles(JavaSparkContext 
jsc, HoodieInstant in
     }
   }
 
+  private void deleteReplacedFiles(HoodieInstant instant) {
+    if (!instant.isCompleted()) {
+      // only delete files for completed instants
+      return;
+    }
+
+    TableFileSystemView fileSystemView = this.table.getFileSystemView();
+    ensureReplacedPartitionsLoadedCorrectly(instant, fileSystemView);
+
+    Stream<HoodieFileGroup> fileGroupsToDelete = fileSystemView

Review comment:
       So, this is tricky to explain. In FileSystemView, only metadata seems to 
be eagerly loaded. file groups are not eagerly loaded. i.e., 
fetchAllStoredFileGroups() returns empty.  For replace instants, we need to get 
List<FileSlice> for all replaced fileId. Because fetchAllStoredFileGroups() is 
empty, its also returning empty list of FileSlices. So we dont delete replaced 
files.
   
   I think instead of creating new HoodieTable in constructor. passing that 
from callers would help workaround this problem. But that is somewhat involved 
change because of test dependencies. Also, it might be better to refresh 
partition content in case new files are created by compaction or other process 
and somehow that is not reflected in table views. This might be safer option.
   
   Let me know if you want me to work on passing in HoodieTable to 
HoodieTimelineArchiveLog constructor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to