alliasgher commented on code in PR #873:
URL: https://github.com/apache/iceberg-go/pull/873#discussion_r3074793047
##########
table/metadata.go:
##########
@@ -871,6 +884,8 @@ func (b *MetadataBuilder) buildCommonMetadata()
(*commonMetadata, error) {
SortOrderList: b.sortOrderList,
DefaultSortOrderID: b.defaultSortOrderID,
SnapshotRefs: b.refs,
+ StatisticsList: b.statisticsList,
+ PartitionStatsList: b.partitionStatsList,
Review Comment:
Good call — updated the PR description to lead with the builder bug and
frame RemoveSnapshots pruning as the secondary fix.
##########
table/metadata.go:
##########
@@ -505,6 +509,15 @@ func (b *MetadataBuilder) RemoveSnapshots(snapshotIds
[]int64, postCommit bool)
}
b.refs = newRefs
+ // Prune statistics entries whose snapshot was removed so that table
+ // metadata does not retain stale statistics references.
+ b.statisticsList = slices.DeleteFunc(b.statisticsList, func(e
StatisticsFile) bool {
+ return slices.Contains(snapshotIds, e.SnapshotID)
+ })
+ b.partitionStatsList = slices.DeleteFunc(b.partitionStatsList, func(e
PartitionStatisticsFile) bool {
+ return slices.Contains(snapshotIds, e.SnapshotID)
+ })
Review Comment:
Noted.
##########
table/metadata_builder_internal_test.go:
##########
@@ -622,6 +624,76 @@ func TestRemoveSnapshotRemovesBranch(t *testing.T) {
}
}
+// TestRemoveSnapshotsPrunesStatistics verifies that RemoveSnapshots also
+// prunes StatisticsList and PartitionStatsList entries whose SnapshotID
+// matches a removed snapshot. See iceberg-go#836.
+func TestRemoveSnapshotsPrunesStatistics(t *testing.T) {
Review Comment:
Added `TestBuildPreservesStatistics` in ed1b085 — plain
`MetadataBuilderFromBase → Build()` with no mutations, asserts both lists
survive.
##########
table/metadata.go:
##########
@@ -261,6 +263,8 @@ func MetadataBuilderFromBase(metadata Metadata,
currentFileLocation string) (*Me
b.refs = maps.Collect(metadata.Refs())
b.snapshotLog = slices.Collect(metadata.SnapshotLogs())
b.metadataLog = slices.Collect(metadata.PreviousFiles())
+ b.statisticsList = slices.Collect(metadata.Statistics())
+ b.partitionStatsList = slices.Collect(metadata.PartitionStatistics())
Review Comment:
Exactly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]