alliasgher opened a new pull request, #874:
URL: https://github.com/apache/iceberg-go/pull/874

   ## Summary
   
   `removeSnapshotsUpdate.PostCommit()` collects manifest lists, manifests, and 
data files for deletion from object storage when snapshots are expired. It did 
not collect `StatisticsFile` or `PartitionStatisticsFile` paths, so those files 
were leaked indefinitely after expiration.
   
   Fixes #837
   
   ## Changes (`table/updates.go`)
   
   After the existing snapshot/manifest/data-file collection in `PostCommit`:
   
   1. Build a set of removed snapshot IDs.
   2. Iterate `preTable.Metadata().Statistics()` and `PartitionStatistics()`; 
for any entry whose `SnapshotID` is in the removed set, queue its 
`StatisticsPath` for deletion.
   3. Symmetric to the existing manifest/data-file logic, remove from the 
deletion set any `StatisticsPath` that still exists in 
`postTable.Metadata().Statistics()` / `PartitionStatistics()` so we do not 
delete files that are still referenced.
   
   ## Dependency
   
   This depends on the prune-on-`RemoveSnapshots` fix in #873. Without that, 
`postTable.Metadata().Statistics()` would still contain the removed snapshot's 
entries and the post-commit loop would skip the deletion. Either ordering of 
the merges works as long as both land.
   
   ## Verification
   
   - `go build ./table/` passes
   - `go vet ./table/` passes
   - `go test ./table/` passes (existing `TestRemoveSnapshotsPostCommitSkipped` 
still passes)
   
   Fixes #837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to