[
https://issues.apache.org/jira/browse/SPARK-55014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55014:
-----------------------------------
Labels: pull-request-available (was: )
> When a filter has multiple conditions with expensive references we should try
> and split it
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-55014
> URL: https://issues.apache.org/jira/browse/SPARK-55014
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Holden Karau
> Priority: Major
> Labels: pull-request-available
>
> This is a follow on to SPARK-47672. For example if you have a filter
> referencing two different columns added in a projection with regexes (or
> arbitrary functional calls, etc.) we should split the projection into two so
> the second regex need only be evaluated on the smaller data set.
>
> The logic for doing this gets kind of complex, and it can increase the size
> of the query plan, but it only increases the plan size where it would likely
> reduce the amount of data evaluated. There is a working impl proposed as part
> of 47672 but it was decided it was too complex for part of a regressionfix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]