laskoviymishka opened a new issue, #831:
URL: https://github.com/apache/iceberg-go/issues/831
### Feature Request / Improvement
Parent: #829 (v2 spec completion)
`ExpireSnapshots` (#677, #679, #783) and `DeleteOrphanFiles` exist but have
gaps that affect production correctness and performance.
### Current gaps
**Statistics file tracking:** `PostCommit` in `removeSnapshotsUpdate` and
`getReferencedFiles` in orphan cleanup both ignore `StatisticsFile` and
`PartitionStatisticsFile`. There's a TODO on `orphan_cleanup.go:247` for this.
The metadata support was added in #577, but GC never picked it up.
`MetadataBuilder.RemoveSnapshots` also doesn't prune stats entries for removed
snapshots.
**ListableIO interface:** `orphan_cleanup.go` uses
`reflect.ValueOf(fsys).Elem().FieldByName("Bucket")` to extract the cloud
bucket for directory walking. This is fragile and breaks with custom IO
implementations (the IO registry pattern from #709 makes this more likely).
Need a proper `ListableIO` interface with `WalkDir`.
**Bulk delete:** Deleting thousands of orphaned files one-by-one via
`IO.Remove()` is O(N) API calls. S3 `DeleteObjects` can handle 1000 keys per
call. A `BulkRemovableIO` interface would allow optimized batch deletion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]