[I] Support deep schema pruning and projection [datafusion]

via GitHub Wed, 31 Jul 2024 04:41:26 -0700


adragomir opened a new issue, #11745:
URL: https://github.com/apache/datafusion/issues/11745


   ### Is your feature request related to a problem or challenge?
   
   At the moment, Datafusion supports top-level column pruning - we have a 
mechanism, `projection: [usize]` where we detect, and pass through all the 
layers a set of top-level columns to get from a schema. The columns are 
inferred from the input and passed through all the layers (logical -> optimize 
-> physical). Some implementation can also take advantage of these to minimize 
the data read from storage at the lowest level . 
   
   However, for **deeply nested schemas** (a small number of huge deeply nested 
top-level column, list of structs with maps etc), this optimization is not so 
useful, because the actual top level columns are very large. 
   
   
   ### Describe the solution you'd like
   
   We should have a way to represent, and push through all the layers the 
"deep" projection of the actual leaves that we need in the query. 
   The schema and data returned after applying the deep schema pruning should 
reflect the changes (select 1 field from a struct in a list, we should get a 
list with a struct with a single field etc)
   The feature needs to be applicable only to physical layouts that actually 
support it (for example Parquet and Arrow) 
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Support deep schema pruning and projection [datafusion]

Reply via email to