anuragmantri commented on PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#issuecomment-3837498559
> I could definitely be convinced otherwise, but it seems odd to me that we
are carrying around two different sort objects in the writer.
I think @jbewing did it this way because Iceberg may request Spark to order
by fields outside of just the table sort-order-id (for example, sort by row
positions, etc.). We need the transform to convert them to Iceberg sort order
id metadata.
But I agree that carrying both the Spark and Iceberg sort orders is odd.
Would it be too restrictive to put a constraint that we will only sort by table
sort order id? Would it be simpler to always materialize the table sort order
id in the metadata?
(We don't do this validation currently. For example, we can pass any
arbitrary field to RewriteDataFiles, and with this PR, it will be persisted in
the metadata.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]