pvary commented on PR #14297: URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4057345530
> 1. I managed to wire the shredding writer through `WriterFunction` API: Added a `writeProperties`-aware overload to `WriterFunction` in `BaseFormatModel`, forwarded collected properties in `ParquetFormatModel`, and introduced `SparkParquetWriterFunction` in `SparkFormatModels` (v4.1 only) to route to the shredding writer when enabled. I didn't have time to review the full PR yet, but I had the same discussion with @Guosmilesmile on Slack, that I don't like the change in the `WriterFunction`. You should only create the BufferedWriter first. Based on the properties provided to the writer you collect and buffer the data, and then create the real writer once the buffer is full. At this point the writer decided on the fileSchema (Parquet schema), and that should be enough to create the real writer. I have started with a similar example, but based on @rdblue's comments we removed the properties and opted for the schemas only solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
