Holden Karau created SPARK-55014:
------------------------------------
Summary: When a filter has multiple conditions with expensive
references we should try and split it
Key: SPARK-55014
URL: https://issues.apache.org/jira/browse/SPARK-55014
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.2.0
Reporter: Holden Karau
This is a follow on to SPARK-47672. For example if you have a filter
referencing two different columns added in a projection with regexes (or
arbitrary functional calls, etc.) we should split the projection into two so
the second regex need only be evaluated on the smaller data set.
The logic for doing this gets kind of complex, and it can increase the size of
the query plan, but it only increases the plan size where it would likely
reduce the amount of data evaluated. There is a working impl proposed as part
of 47672 but it was decided it was too complex for part of a regressionfix.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]