Chao Sun created SPARK-57176:
--------------------------------

             Summary: Extend nested column pruning through array-returning 
functions
                 Key: SPARK-57176
                 URL: https://issues.apache.org/jira/browse/SPARK-57176
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Chao Sun


SPARK-57022 added nested column pruning for transform over array<struct> 
inputs, and SPARK-57175 extends the same optimization to exists and forall. 
Array-returning functions still retain the full element struct even when 
downstream expressions and lambdas only require a subset of nested fields.

For example:

{code:sql}
SELECT filter(friends, friend -> friend.last = 'Smith').first
FROM contacts
{code}

If friends is an array of structs containing first, middle, and last, Spark 
currently reads the complete struct even though only first and last are needed.

Extend nested schema pruning through array-returning functions where narrowing 
is semantics-preserving:

* Merge downstream result-field requirements with lambda requirements for 
filter and comparator-based array_sort.
* Propagate projected element schemas through reverse, shuffle, slice, and 
array_compact.
* Rewrite bound lambda variable types and nested field ordinals after pruning.
* Retain the full element schema when the whole result is used, when a lambda 
consumes the whole element, or when default array_sort natural ordering 
requires the full struct.

Functions that inspect full element equality or natural ordering remain out of 
scope because dropping nested fields could change results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to