Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

via GitHub Tue, 24 Feb 2026 20:48:00 -0800


jbewing commented on code in PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#discussion_r2850809699



##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteOptions.java:
##########
@@ -54,6 +54,7 @@ private SparkWriteOptions() {}
   public static final String REWRITTEN_FILE_SCAN_TASK_SET_ID = 
"rewritten-file-scan-task-set-id";
 
   public static final String OUTPUT_SPEC_ID = "output-spec-id";
+  public static final String OUTPUT_SORT_ORDER_ID = "output-sort-order-id";

Review Comment:
   Missed this one a few weeks ago, sorry about that. But I did totally start 
here. I can provide a bug fix that makes your cursor impl pass a few more test 
cases which is basically that you want to walk the spark actual sort order and 
your expected iceberg sort order from the end of the actual spark sort order 
rather than starting at the front. The different is subtle but it accommodates 
cases like this: 
https://github.com/apache/iceberg/blob/9534c9b3adc29d127ecc541ce131f49fd72f1980/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteUtil.java#L210-L216
   
   You also want to be quite careful about the order in which you iterate the 
table sort orders. I found that an explicit iteration order of:
   1. Prefer the active sort order first (this may or may not be the sort order 
with the highest id)
   2. Then prefer going in descending id order of the inactive sort orders



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

Reply via email to