cassio-paesleme opened a new pull request, #896: URL: https://github.com/apache/iceberg-go/pull/896
## Problem arrow-go defaults the Parquet root schema element repetition to `Repeated`. Snowflake (and some other readers) interpret `Repeated` at the root as one-level list encoding and reject files that contain list columns. This causes write failures when targeting Snowflake-managed Iceberg tables. ## Fix Add a `write.parquet.root-repetition` table property (values: `required` / `optional` / `repeated`, default: `required`). The default `required` aligns with the Parquet spec and matches the behaviour of arrow-rs, pyarrow, and parquet-java. The property is applied in `parquetFormat.GetWriteProperties` via `parquet.WithRootRepetition`. ## Why this default is correct The Parquet spec defines the root message element as a container, not a repeated field. Defaulting to `Required` is the most interoperable choice and is what every major Parquet writer already does. arrow-go's `Repeated` default is an outlier. ## Testing Existing `./table/...` suite passes. The property is exercised end-to-end in the Docker data platform against Snowflake-managed Iceberg tables (docker/data-platform#406). --- *Original implementation by @hcrosse.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
