cassio-paesleme opened a new pull request, #896:
URL: https://github.com/apache/iceberg-go/pull/896

   ## Problem
   
   arrow-go defaults the Parquet root schema element repetition to `Repeated`. 
Snowflake (and some other readers) interpret `Repeated` at the root as 
one-level list encoding and reject files that contain list columns. This causes 
write failures when targeting Snowflake-managed Iceberg tables.
   
   ## Fix
   
   Add a `write.parquet.root-repetition` table property (values: `required` / 
`optional` / `repeated`, default: `required`). The default `required` aligns 
with the Parquet spec and matches the behaviour of arrow-rs, pyarrow, and 
parquet-java.
   
   The property is applied in `parquetFormat.GetWriteProperties` via 
`parquet.WithRootRepetition`.
   
   ## Why this default is correct
   
   The Parquet spec defines the root message element as a container, not a 
repeated field. Defaulting to `Required` is the most interoperable choice and 
is what every major Parquet writer already does. arrow-go's `Repeated` default 
is an outlier.
   
   ## Testing
   
   Existing `./table/...` suite passes. The property is exercised end-to-end in 
the Docker data platform against Snowflake-managed Iceberg tables 
(docker/data-platform#406).
   
   ---
   
   *Original implementation by @hcrosse.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to