Hi folks,

I’ve noticed some interesting differences across Iceberg clients when
assigning new field IDs during schema conversion

Specifically:

   1.

   *Iceberg Java* assigns field IDs using *ordinal order for the root
   struct*, followed by a *post-order traversal* for nested structs. For
   example:

   struct<
     0: id: required long,
     1: info: optional struct<
       4: name: optional string,
       5: attrs: optional struct<
         2: age: optional int,
         3: score: optional double
       >
     >
   >

   Here, nested fields follow a post-order traversal (age → score → attrs →
   name).
   2.

   *Iceberg Python* appears to use a *pre-order traversal* when assigning
   fresh field IDs:

   
https://github.com/apache/iceberg-python/blob/950fc7131b8e597f73647c6ff2bd78d0b24102ad/pyiceberg/schema.py#L1295
   3.

   *Iceberg Rust* does not currently have a helper for schema
   conversion+field id assignment, but some existing logic appears to
follow a *level-order
   traversal*:

   
https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/spec/schema/id_reassigner.rs#L27

This leads to two questions:

   1.

   *Does the assignment order of fresh field IDs actually matter?*
   My intuition is that it should not, as long as the field-ID → field
   mapping is consistent and the highest field ID is tracked correctly, but I
   would love to be corrected
   2.

   *If the order does matter, is there a recommended or canonical traversal
   order that clients should follow?*

Any guidance or historical context would be appreciated. Thanks!

Best,
Shawn

Reply via email to