IgorBerman opened a new issue, #11217: URL: https://github.com/apache/iceberg/issues/11217
### Feature Request / Improvement Hi currently Iceberg not permitting to read from Iceberg table with user specified schema. However, in some scenarios it might be highly beneficial: Current support of column pruning in spark doesn't handles highly nested schema(think of array of structs which have array of structs inside and yet another level or few more :) Think e.g about pageview object that have array of widgets while each widget has array of items, each of pageview/widget/item has a lot of fields. While working with this object model one usually needs to explode to some level and here is a catch. Since explode is not handled well from column pruning perspective, effectively we reading all columns of widget/item (so basically missing columnar nature of parquet format and reading almost full schema) Anyway there is progress done by spark community to solve this issue, but we still not "there" yet e.g. rather trivial examples described in https://issues.apache.org/jira/browse/SPARK-47230 or https://issues.apache.org/jira/browse/SPARK-34956. While working with files on hdfs we could provide partial schema with spark.read.schema(partial-schema).parquet(...), thus manually solving this inefficiency, however Iceberg and spark itself prohibit providing user defined schema while working with catalog tables. I believe it might permit users which have highly nested schemas to benefit from Iceberg. One option would be to provide this sub-schema through custom option(e.g). Of course one can validated that sub-schema is really sub of schema of the table(to remove any chance of some wrong expectations/errors) From our internal tests this might bring up to 30% performance gains in those environments(once again since it permits to read less columns) thanks in advance ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [X] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [ ] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
