[ 
https://issues.apache.org/jira/browse/SPARK-53996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaromir Vanek updated SPARK-53996:
----------------------------------
    Component/s: SQL
                     (was: Optimizer)

> InferFiltersFromConstraints rule does not infer filter conditions for complex 
> join expressions
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53996
>                 URL: https://issues.apache.org/jira/browse/SPARK-53996
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Jaromir Vanek
>            Priority: Minor
>
> The Spark optimizer's {{InferFiltersFromConstraints}} rule is currently 
> unable to infer filter conditions when join constraints involve complex 
> expressions. While it works for simple attribute equalities ({{{}a = b{}}}), 
> it can't infer constraint from anything more complex than that.
> *Example (works as expected):*
> {code:sql}
> SELECT *
> FROM t1
> JOIN t2 ON t1.a = t2.b
> WHERE t2.b = 1
> {code}
> In this case, the optimizer correctly infers the additional constraint 
> {{{}t1.a = 1{}}}.
> *Example (does not work):*
> {code:sql}
> SELECT *
> FROM t1
> JOIN right ON t1.a = t2.b + 2
> WHERE t2.b = 1
> {code}
> In this case, it is clear that {{t1.a = 3}} (since {{b = 1}} and {{{}a = b + 
> 2{}}}), but the optimizer does not infer this constraint.
> *How to Reproduce:*
> {code:scala}
> spark.sql("CREATE TABLE t1(a INT)")
> spark.sql("CREATE TABLE t2(b INT)")
> spark.sql("""
> SELECT * 
> FROM t1 
> INNER JOIN t2 ON t2.b = t1.a + 2 
> WHERE t1.a = 1
> """).explain
> {code}
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan
> +- BroadcastHashJoin [(a#2 + 2)], [b#3], Inner, BuildRight, false
>    :- Filter (isnotnull(a#2) AND (a#2 = 1))
>    :  +- FileScan spark_catalog.default.t1[a#2]
>       +- Filter isnotnull(b#3)
>          +- FileScan spark_catalog.default.t2[b#3]
> {code}
> *Expected Behavior:*
> The optimizer should be able to statically evaluate and infer that {{t2.b = 
> 3}} given the join condition and the filter on {{{}t1.a{}}}.
> *Impact:*
> This limits the optimizer's ability to push down filters and optimize query 
> execution plans for queries with complex join conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to