[ 
https://issues.apache.org/jira/browse/SPARK-44812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44812:
-----------------------------------
    Labels: pull-request-available  (was: )

> Push filters through intersect
> ------------------------------
>
>                 Key: SPARK-44812
>                 URL: https://issues.apache.org/jira/browse/SPARK-44812
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.1
>            Reporter: copperybean
>            Priority: Major
>              Labels: pull-request-available
>
> For following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) where a > 123 
> {code}
> The physical plan is
> {code:bash}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- HashAggregate(keys=[a#8L], functions=[])
>    +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
>       +- HashAggregate(keys=[a#8L], functions=[])
>          +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>             :- Filter (isnotnull(a#8L) AND (a#8L > 123))
>             :  +- FileScan json [a#8L] ...
>             +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=129]
>                +- FileScan json [x#24L] ... {code}
> We can find the filter {color:#ff8b00}_a > 123_{color} is not pushed to right 
> table.
>  
>  
> Further more, for following SQL
> {code:sql}
> select a from (select a from tl intersect select x from tr) join trr on a = 
> y{code}
> The physical plan is
> {code:bash}
> *(3) Project [a#8L]
> +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
>    :- *(3) HashAggregate(keys=[a#8L], functions=[])
>    :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=506]
>    :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
>    :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], 
> [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
>    :           :- *(1) Filter isnotnull(a#8L)
>    :           :  +- FileScan json [a#8L] ...
>    :           +- BroadcastExchange 
> HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), 
> isnull(input[0, bigint, true])),false), [plan_id=490]
>    :              +- FileScan json [x#24L] ...
>    +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]),false), [plan_id=512]
>       +- *(2) Filter isnotnull(y#114L)
>          +- FileScan json [y#114L] ...{code}
> There should be a filter _{color:#ff8b00}isnotnull( x ){color}_ for table tr, 
> while it's not pushed down.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to