jbewing commented on code in PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#discussion_r2850809699
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteOptions.java:
##########
@@ -54,6 +54,7 @@ private SparkWriteOptions() {}
public static final String REWRITTEN_FILE_SCAN_TASK_SET_ID =
"rewritten-file-scan-task-set-id";
public static final String OUTPUT_SPEC_ID = "output-spec-id";
+ public static final String OUTPUT_SORT_ORDER_ID = "output-sort-order-id";
Review Comment:
Missed this one a few weeks ago, sorry about that. But I did totally start
here. I can provide a bug fix that makes your cursor impl pass a few more test
cases which is basically that you want to walk the spark actual sort order and
your expected iceberg sort order from the end of the actual spark sort order
rather than starting at the front. The different is subtle but it accommodates
cases like this:
https://github.com/apache/iceberg/blob/9534c9b3adc29d127ecc541ce131f49fd72f1980/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/SparkWriteUtil.java#L210-L216
You also want to be quite careful about the order in which you iterate the
table sort orders. I found that an explicit iteration order of:
1. Prefer the active sort order first (this may or may not be the sort order
with the highest id)
2. Then prefer going in descending id order of the inactive sort orders
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]