aihuaxu commented on PR #14297: URL: https://github.com/apache/iceberg/pull/14297#issuecomment-3730387702
> This PR is super exciting! Does this rely on variant shredding support in Spark? Is it supported in Spark 4.1 already, or planned for future releases? > > Regarding the heuristics - I'd like to propose adding table properties as hints for variant shredding. Similarly to properties used for bloom filters, it could be good to introduce something like `write.parquet.variant-shredding-enabled.column.col1`, which will hint to the writer that this column is important for shredding. Many variants have important fields for which shredding should be enforced, and other fields which are less central and can be managed with simpler heuristics. Would love to hear your thoughts! Yeah. I'm also thinking of that too. Will address that separately. Basically based on read pattern, the user can specify the shredding schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
