[
https://issues.apache.org/jira/browse/SPARK-57176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Sun reassigned SPARK-57176:
--------------------------------
Assignee: Chao Sun
> Extend nested column pruning through array-returning functions
> --------------------------------------------------------------
>
> Key: SPARK-57176
> URL: https://issues.apache.org/jira/browse/SPARK-57176
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Labels: pull-request-available
>
> SPARK-57022 added nested column pruning for transform over array<struct>
> inputs, and SPARK-57175 extends the same optimization to exists and forall.
> Array-returning functions still retain the full element struct even when
> downstream expressions and lambdas only require a subset of nested fields.
> For example:
> {code:sql}
> SELECT filter(friends, friend -> friend.last = 'Smith').first
> FROM contacts
> {code}
> If friends is an array of structs containing first, middle, and last, Spark
> currently reads the complete struct even though only first and last are
> needed.
> Extend nested schema pruning through array-returning functions where
> narrowing is semantics-preserving:
> * Merge downstream result-field requirements with lambda requirements for
> filter and comparator-based array_sort.
> * Propagate projected element schemas through reverse, shuffle, slice, and
> array_compact.
> * Rewrite bound lambda variable types and nested field ordinals after pruning.
> * Retain the full element schema when the whole result is used, when a lambda
> consumes the whole element, or when default array_sort natural ordering
> requires the full struct.
> Functions that inspect full element equality or natural ordering remain out
> of scope because dropping nested fields could change results.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]