satishkotha commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r490491745
########## File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java ########## @@ -301,6 +304,61 @@ private void deleteAnyLeftOverMarkerFiles(JavaSparkContext jsc, HoodieInstant in } } + private void deleteReplacedFiles(HoodieInstant instant) { + if (!instant.isCompleted()) { + // only delete files for completed instants + return; + } + + TableFileSystemView fileSystemView = this.table.getFileSystemView(); + ensureReplacedPartitionsLoadedCorrectly(instant, fileSystemView); + + Stream<HoodieFileGroup> fileGroupsToDelete = fileSystemView Review comment: So, this is tricky to explain. In FileSystemView, only metadata seems to be eagerly loaded. file groups are not eagerly loaded. i.e., fetchAllStoredFileGroups() returns empty. For replace instants, we need to get List<FileSlice> for all replaced fileId. Because fetchAllStoredFileGroups() is empty, its also returning empty list of FileSlices. So we dont delete replaced files. I think instead of creating new HoodieTable in constructor. passing that from callers would help workaround this problem. But that is somewhat involved change because of test dependencies. Also, it might be better to refresh partition content in case new files are created by compaction or other process and somehow that is not reflected in table views. This might be safer option. Let me know if you want me to work on passing in HoodieTable to HoodieTimelineArchiveLog constructor. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org