malinjawi opened a new pull request, #12047: URL: https://github.com/apache/gluten/pull/12047
What changes are proposed in this pull request? This PR adds the first Velox Delta write-side support slice for UniForm Iceberg. The change keeps Delta as the source of truth for the transaction and UniForm metadata-generation flow. Gluten now passes Delta column-mapping field IDs through the Velox native Parquet writer path when `IcebergCompatV2` is enabled, so newly written Delta files can satisfy the Parquet field-id part of the UniForm Iceberg contract. This PR also: - adds `delta-iceberg` to the Velox Delta build/test profile - serializes Delta column-mapping field IDs into native Parquet writer options - parses that option on the Velox side and populates `WriterOptions.parquetFieldIds` - preserves the existing Delta async UniForm metadata generation path after commit - adds a focused end-to-end UniForm Iceberg suite with Hive Metastore-backed Iceberg readback - adds negative coverage for active deletion vectors - updates `docs/get-started/VeloxDelta.md` from `Not tested` to `Partial` Current scope / behavior: - Spark 3.5 Velox native Delta write is covered for new UniForm Iceberg tables using `IcebergCompatV2` - the test verifies generated Iceberg metadata and reads the table back through Iceberg - active deletion vectors are rejected by Delta for UniForm Iceberg - native Velox Iceberg scan for UniForm column-mapping/name-mapping reads is still a follow-up gap, so the docs do not claim full native read support yet How was this patch tested? Build / validation: - `./build/mvn -Pbackends-velox,delta,spark-3.5 -pl backends-velox -am -DskipTests test-compile` - `ninja -j 6 gluten` - `ninja -j 6 libvelox.dylib` - `git diff --check` Focused local validation: - Spark 3.5 `org.apache.spark.sql.delta.DeltaUniFormIcebergSuite` with `-Pbackends-velox,delta,iceberg,spark-3.5` - Spark 3.5 `org.apache.spark.sql.delta.DeltaUniFormIcebergSuite` with `-Pbackends-velox,delta,spark-3.5` to verify clean cancellation without Iceberg test classes - Spark 3.5 `org.apache.spark.sql.delta.DeltaInsertIntoSQLNameColumnMappingSuite` Functional validation: - verified a UniForm Iceberg-enabled Delta table can be written through the Velox native Delta write path - verified Parquet files are written with IcebergCompatV2-compatible field IDs - verified Iceberg metadata JSON is generated after commit - verified generated metadata contains Delta version/timestamp information - verified the table can be read back through Iceberg with expected rows - verified active deletion vectors are rejected for UniForm Iceberg Was this patch authored or co-authored using generative AI tooling? Generated-by: OpenAI Codex issue: #12039 Follow-up work This PR intentionally keeps the scope limited to the supported Delta write and Iceberg readback path. Reasonable follow-ups are: - add native Velox Iceberg read support for UniForm column-mapping/name-mapping tables by plumbing Iceberg/Parquet field IDs through the scan path - add Spark 4.0 UniForm coverage if the same Delta/Iceberg dependency set is enabled there - broaden negative coverage for upgrade/rewrite paths that require `REORG TABLE ... APPLY (UPGRADE UNIFORM(...))` - tighten docs once native Iceberg scan support is proven -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
