kevinjqliu opened a new pull request, #16308: URL: https://github.com/apache/iceberg/pull/16308
Backport of #15832 to `spark/v3.4`. Adds the `output-sort-order-id` write option (and `SparkWriteConf.outputSortOrderId`) and threads the resolved sort-order id through `SparkWrite` and `SparkPositionDeltaWrite` so written data files record the sort order in their manifest entry. `SparkShufflingFileRewriteRunner` resolves the matching table sort order via `SortOrderUtil.findTableSortOrder` and logs a warning when no match exists. ### Adaptation note v3.4 `SparkWrite` still names the parameter `partitionedFanoutEnabled` (v3.5 renamed it to `useFanoutWriter`). I kept the v3.4 name and added the new `sortOrderId` parameter alongside it. All other files match the v3.5 patch. ### Validation - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "*TestSparkWriteConf.testSortOrder*"` (new tests pass) - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "org.apache.iceberg.spark.source.TestSparkDataWrite"` (passes) - spark-extensions tests compile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
