[
https://issues.apache.org/jira/browse/SPARK-53996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-53996:
-----------------------------------
Labels: pull-request-available (was: )
> InferFiltersFromConstraints rule does not infer filter conditions for complex
> join expressions
> ----------------------------------------------------------------------------------------------
>
> Key: SPARK-53996
> URL: https://issues.apache.org/jira/browse/SPARK-53996
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.1
> Reporter: Jaromir Vanek
> Priority: Minor
> Labels: pull-request-available
>
> The Spark optimizer's {{InferFiltersFromConstraints}} rule is currently
> unable to infer filter conditions when join constraints involve complex
> expressions. While it works for simple attribute equalities ({{{}a = b{}}}),
> it can't infer constraint from anything more complex than that.
> *Example (works as expected):*
> {code:sql}
> SELECT *
> FROM t1
> JOIN t2 ON t1.a = t2.b
> WHERE t2.b = 1
> {code}
> In this case, the optimizer correctly infers the additional constraint
> {{{}t1.a = 1{}}}.
> *Example (does not work):*
> {code:sql}
> SELECT *
> FROM t1
> JOIN right ON t1.a = t2.b + 2
> WHERE t2.b = 1
> {code}
> In this case, it is clear that {{t1.a = 3}} (since {{b = 1}} and {{{}a = b +
> 2{}}}), but the optimizer does not infer this constraint.
> *How to Reproduce:*
> {code:scala}
> spark.sql("CREATE TABLE t1(a INT)")
> spark.sql("CREATE TABLE t2(b INT)")
> spark.sql("""
> SELECT *
> FROM t1
> INNER JOIN t2 ON t2.b = t1.a + 2
> WHERE t1.a = 1
> """).explain
> {code}
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan
> +- BroadcastHashJoin [(a#2 + 2)], [b#3], Inner, BuildRight, false
> :- Filter (isnotnull(a#2) AND (a#2 = 1))
> : +- FileScan spark_catalog.default.t1[a#2]
> +- Filter isnotnull(b#3)
> +- FileScan spark_catalog.default.t2[b#3]
> {code}
> *Expected Behavior:*
> The optimizer should be able to statically evaluate and infer that {{t2.b =
> 3}} given the join condition and the filter on {{{}t1.a{}}}.
> *Impact:*
> This limits the optimizer's ability to push down filters and optimize query
> execution plans for queries with complex join conditions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]