Holden Karau created SPARK-55014:
------------------------------------

             Summary: When a filter has multiple conditions with expensive 
references we should try and split it
                 Key: SPARK-55014
                 URL: https://issues.apache.org/jira/browse/SPARK-55014
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Holden Karau


This is a follow on to SPARK-47672. For example if you have a filter 
referencing two different columns added in a projection with regexes (or 
arbitrary functional calls, etc.) we should split the projection into two so 
the second regex need only be evaluated on the smaller data set.

 

The logic for doing this gets kind of complex, and it can increase the size of 
the query plan, but it only increases the plan size where it would likely 
reduce the amount of data evaluated. There is a working impl proposed as part 
of 47672 but it was decided it was too complex for part of a regressionfix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to