Chao Sun created SPARK-57176:
--------------------------------
Summary: Extend nested column pruning through array-returning
functions
Key: SPARK-57176
URL: https://issues.apache.org/jira/browse/SPARK-57176
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.2.0
Reporter: Chao Sun
SPARK-57022 added nested column pruning for transform over array<struct>
inputs, and SPARK-57175 extends the same optimization to exists and forall.
Array-returning functions still retain the full element struct even when
downstream expressions and lambdas only require a subset of nested fields.
For example:
{code:sql}
SELECT filter(friends, friend -> friend.last = 'Smith').first
FROM contacts
{code}
If friends is an array of structs containing first, middle, and last, Spark
currently reads the complete struct even though only first and last are needed.
Extend nested schema pruning through array-returning functions where narrowing
is semantics-preserving:
* Merge downstream result-field requirements with lambda requirements for
filter and comparator-based array_sort.
* Propagate projected element schemas through reverse, shuffle, slice, and
array_compact.
* Rewrite bound lambda variable types and nested field ordinals after pruning.
* Retain the full element schema when the whole result is used, when a lambda
consumes the whole element, or when default array_sort natural ordering
requires the full struct.
Functions that inspect full element equality or natural ordering remain out of
scope because dropping nested fields could change results.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]