nssalian commented on PR #14297: URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4056925508
Hi folks, thank you for your patience. Thanks @aihuaxu for doing all the work for this. The feedback and comments from everyone here was really helpful to make the necessary fixes. I made the following changes after merging the main branch. 1. I managed to wire the shredding writer through `WriterFunction` API: Added a `writeProperties`-aware overload to `WriterFunction` in `BaseFormatModel`, forwarded collected properties in `ParquetFormatModel`, and introduced `SparkParquetWriterFunction` in `SparkFormatModels` (v4.1 only) to route to the shredding writer when enabled. 2. Some fixes were needed: I fixed decimal precision, added some null handling, and applied some heuristics limits too. Implemented field pruning (10% threshold, 300 cap per https://issues.apache.org/jira/browse/SPARK-53659), and deterministic tie-breaking via explicit priority maps. 3. I added some more tests to the `TestVariantShredding.java` too to check for various behaviors. Happy to discuss any of the changes. Please have a look. CC: @pvary @huaxingao @aihuaxu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
