Jeffail opened a new issue, #935:
URL: https://github.com/apache/iceberg-go/issues/935

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   ### Problem
   
   `NewUpdateSchema` seeds its fresh-id counter from the current schema's 
`HighestFieldID()` rather than the table metadata's `last-column-id`. When the 
current schema's highest id is lower than the lifetime max — after a previous 
evolution added columns that were later dropped, or after a schema swap — 
`AddColumn` allocates ids that are already referenced by fields in historical 
schemas.
   
   The Iceberg spec reserves `last-column-id` on table metadata as the 
monotonic counter for exactly this purpose; Java's 
[`SchemaUpdate`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SchemaUpdate.java)
 seeds from `metadata.lastColumnId()`.
   
   ### Symptom
   
   Catalogs that index the full schema history reject the commit. AWS Glue's 
`schemasToGlueColumns` (`catalog/glue/schema.go`) surfaces it as:
   
   ```
   InvalidInputException: Multiple entries with same key: 12=<top-level-field> 
and 12=<deeply.nested.field>
   ```
   
   with the two fields living in different schemas that happen to share an id.
   
   ### Root cause
   
   `table/update_schema.go` in `NewUpdateSchema`:
   
   ```go
   lastColumnID: txn.meta.CurrentSchema().HighestFieldID(),
   ```
   
   `MetadataBuilder.AddSchema` already keeps `last-column-id` monotonic via 
`max(b.lastColumnId, schema.HighestFieldID())` at `table/metadata.go:350`, and 
the `Metadata` interface exposes the value at `table/metadata.go:82`. 
`UpdateSchema` simply doesn't consult it.
   
   ### Fix
   
   Seed `lastColumnID` from `txn.meta.LastColumnID()` in `NewUpdateSchema`. 
Requires exposing `LastColumnID()` as a public getter on `*MetadataBuilder` — 
the `Metadata` interface already has it, but `txn.meta` is the builder rather 
than the interface.
   
   ### Relation to existing issues
   
   - #538 (`last-column-id` payload field regression on deleting the highest-id 
column) — server-side payload shape, not the client-side counter seed.
   - #593 / #658 (schema-level rejection of duplicate field ids) — 
complementary defensive check at a different layer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to