Jeffail opened a new issue, #935: URL: https://github.com/apache/iceberg-go/issues/935
### Apache Iceberg version main (development) ### Please describe the bug 🐞 ### Problem `NewUpdateSchema` seeds its fresh-id counter from the current schema's `HighestFieldID()` rather than the table metadata's `last-column-id`. When the current schema's highest id is lower than the lifetime max — after a previous evolution added columns that were later dropped, or after a schema swap — `AddColumn` allocates ids that are already referenced by fields in historical schemas. The Iceberg spec reserves `last-column-id` on table metadata as the monotonic counter for exactly this purpose; Java's [`SchemaUpdate`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SchemaUpdate.java) seeds from `metadata.lastColumnId()`. ### Symptom Catalogs that index the full schema history reject the commit. AWS Glue's `schemasToGlueColumns` (`catalog/glue/schema.go`) surfaces it as: ``` InvalidInputException: Multiple entries with same key: 12=<top-level-field> and 12=<deeply.nested.field> ``` with the two fields living in different schemas that happen to share an id. ### Root cause `table/update_schema.go` in `NewUpdateSchema`: ```go lastColumnID: txn.meta.CurrentSchema().HighestFieldID(), ``` `MetadataBuilder.AddSchema` already keeps `last-column-id` monotonic via `max(b.lastColumnId, schema.HighestFieldID())` at `table/metadata.go:350`, and the `Metadata` interface exposes the value at `table/metadata.go:82`. `UpdateSchema` simply doesn't consult it. ### Fix Seed `lastColumnID` from `txn.meta.LastColumnID()` in `NewUpdateSchema`. Requires exposing `LastColumnID()` as a public getter on `*MetadataBuilder` — the `Metadata` interface already has it, but `txn.meta` is the builder rather than the interface. ### Relation to existing issues - #538 (`last-column-id` payload field regression on deleting the highest-id column) — server-side payload shape, not the client-side counter seed. - #593 / #658 (schema-level rejection of duplicate field ids) — complementary defensive check at a different layer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
