nssalian commented on PR #14297:
URL: https://github.com/apache/iceberg/pull/14297#issuecomment-4056925508

   Hi folks, thank you for your patience. Thanks @aihuaxu for doing all the 
work for this. The feedback and comments from everyone here was really helpful 
to make the necessary fixes. I made the following changes after merging the 
main branch.
   
   1. I managed to wire the shredding writer through `WriterFunction` API: 
Added a `writeProperties`-aware overload to `WriterFunction` in 
`BaseFormatModel`, forwarded collected properties in `ParquetFormatModel`, and 
introduced `SparkParquetWriterFunction` in `SparkFormatModels` (v4.1 only) to 
route to the shredding writer when enabled.
   2.  Some fixes were needed: I fixed decimal precision, added some null 
handling, and applied some heuristics limits too. Implemented field pruning 
(10% threshold, 300 cap per https://issues.apache.org/jira/browse/SPARK-53659), 
and deterministic tie-breaking via explicit priority maps.
   3. I added some more tests to the `TestVariantShredding.java` too to check 
for various behaviors.
   
   Happy to discuss any of the changes. Please have a look.
   CC: @pvary @huaxingao @aihuaxu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to