Re: [PR] Spark: Support writing shredded variant in Iceberg-Spark [iceberg]

via GitHub Fri, 13 Mar 2026 12:02:42 -0700


pvary commented on PR #14297:
URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4057345530


   > 1. I managed to wire the shredding writer through `WriterFunction` API: 
Added a `writeProperties`-aware overload to `WriterFunction` in 
`BaseFormatModel`, forwarded collected properties in `ParquetFormatModel`, and 
introduced `SparkParquetWriterFunction` in `SparkFormatModels` (v4.1 only) to 
route to the shredding writer when enabled.
   
   I didn't have time to review the full PR yet, but I had the same discussion 
with @Guosmilesmile on Slack, that I don't like the change in the 
`WriterFunction`. You should only create the BufferedWriter first. Based on the 
properties provided to the writer you collect and buffer the data, and then 
create the real writer once the buffer is full. At this point the writer 
decided on the fileSchema (Parquet schema), and that should be enough to create 
the real writer.
   
   I have started with a similar example, but based on @rdblue's comments we 
removed the properties and opted for the schemas only solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Support writing shredded variant in Iceberg-Spark [iceberg]

Reply via email to