[ 
https://issues.apache.org/jira/browse/SPARK-55014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55014:
-----------------------------------
    Labels: pull-request-available  (was: )

> When a filter has multiple conditions with expensive references we should try 
> and split it
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55014
>                 URL: https://issues.apache.org/jira/browse/SPARK-55014
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Holden Karau
>            Priority: Major
>              Labels: pull-request-available
>
> This is a follow on to SPARK-47672. For example if you have a filter 
> referencing two different columns added in a projection with regexes (or 
> arbitrary functional calls, etc.) we should split the projection into two so 
> the second regex need only be evaluated on the smaller data set.
>  
> The logic for doing this gets kind of complex, and it can increase the size 
> of the query plan, but it only increases the plan size where it would likely 
> reduce the amount of data evaluated. There is a working impl proposed as part 
> of 47672 but it was decided it was too complex for part of a regressionfix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to