adragomir opened a new issue, #11745: URL: https://github.com/apache/datafusion/issues/11745
### Is your feature request related to a problem or challenge? At the moment, Datafusion supports top-level column pruning - we have a mechanism, `projection: [usize]` where we detect, and pass through all the layers a set of top-level columns to get from a schema. The columns are inferred from the input and passed through all the layers (logical -> optimize -> physical). Some implementation can also take advantage of these to minimize the data read from storage at the lowest level . However, for **deeply nested schemas** (a small number of huge deeply nested top-level column, list of structs with maps etc), this optimization is not so useful, because the actual top level columns are very large. ### Describe the solution you'd like We should have a way to represent, and push through all the layers the "deep" projection of the actual leaves that we need in the query. The schema and data returned after applying the deep schema pruning should reflect the changes (select 1 field from a struct in a list, we should get a list with a struct with a single field etc) The feature needs to be applicable only to physical layouts that actually support it (for example Parquet and Arrow) ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org