steveloughran opened a new issue, #16172: URL: https://github.com/apache/iceberg/issues/16172
### Feature Request / Improvement Issue to group together everything needed for queries over Variant data to work well. 1. Auto generation of shredded fields. 2. Unmarshalling performance. 3. Rowgroup and file skipping based on shredded field stats. 4. Benchmarks to evaluate this Iceberg query performance relies on spark to pass down variant_get() calls to the rowgroup filter, so the changes are interrelated. This stuff will have to target spark 4.2 only ## Iceberg #14297 #15628 #15510 #15385 ## Spark * [54598](https://github.com/apache/spark/pull/54598) Enable Parquet rowgroup skipping for variant filters * [54394](https://github.com/apache/spark/pull/54394) Support variant_get predicate for DSv2 filter pushdown ## Parquet: better unmarshalling * [3452](https://github.com/apache/parquet-java/pull/3452) * [3481](https://github.com/apache/parquet-java/pull/3481) ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [x] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
