Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

via GitHub Mon, 02 Feb 2026 13:30:06 -0800


anuragmantri commented on PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#issuecomment-3837498559


   
   > I could definitely be convinced otherwise, but it seems odd to me that we 
are carrying around two different sort objects in the writer. 
   
    I think @jbewing did it this way because Iceberg may request Spark to order 
by fields outside of just the table sort-order-id (for example, sort by row 
positions, etc.). We need the transform to convert them to Iceberg sort order 
id metadata. 
   
   But I agree that carrying both the Spark and Iceberg sort orders is odd. 
Would it be too restrictive to put a constraint that we will only sort by table 
sort order id? Would it be simpler to always materialize the table sort order 
id in the metadata? 
   
   (We don't do this validation currently. For example, we can pass any 
arbitrary field to RewriteDataFiles, and with this PR, it will be persisted in 
the metadata.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

Reply via email to