rdblue commented on code in PR #15006:
URL: https://github.com/apache/iceberg/pull/15006#discussion_r2850209981


##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1076,12 +1067,24 @@ private List<ManifestFile> newDeleteFilesAsManifests() {
       // this triggers a rewrite of all delete manifests even if there is only 
one new delete file
       // if there is a relevant use case in the future, the behavior can be 
optimized
       cachedNewDeleteManifests.clear();
+      // On cache invalidation of delete files, clear the whole summary.
+      // Since the summary contained both data files and DVs, add back the 
data files.
+      addedFilesSummary.clear();
+      newDataFilesBySpec.forEach(

Review Comment:
   I think it would be better not to handle data files in 
`newDeleteFilesAsManifests`. It isn't obvious that this method is going to 
clear the data file summary and no one would look here for it. Why not keep a 
separate summary for data files and merge it with the deletes for the final 
summary? The data file one doesn't need to change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to