[ https://issues.apache.org/jira/browse/SPARK-43339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733780#comment-17733780 ]
Yuming Wang commented on SPARK-43339: ------------------------------------- This is not a bug. The final result is correct. https://github.com/apache/spark/blob/d88633ada5eb73e8876acaa2c2a53b9596f2acdd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L187-L194 > LEFT JOIN is treated as INNER JOIN when being in a middle of double join > ------------------------------------------------------------------------ > > Key: SPARK-43339 > URL: https://issues.apache.org/jira/browse/SPARK-43339 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.4.0 > Reporter: Leonid Chistov > Priority: Major > > Consider query like > > {code:java} > SELECT ss_item_sk > FROM store_sales > LEFT OUTER JOIN store_returns > ON ( sr_item_sk = ss_item_sk ), > reason > WHERE sr_reason_sk = r_reason_sk > AND r_reason_desc = 'reason 38'{code} > > Spark generates following plan: > > {code:java} > AdaptiveSparkPlan isFinalPlan=false > +- Project [ss_item_sk#2] > +- BroadcastHashJoin [sr_reason_sk#458], [r_reason_sk#734], Inner, > BuildRight, false > :- Project [ss_item_sk#2, sr_reason_sk#458] > : +- BroadcastHashJoin [ss_item_sk#2], [sr_item_sk#452], Inner, > BuildRight, false > : :- FileScan parquet [ss_item_sk#2] Batched: true, DataFilters: > [], Format: Parquet, Location: InMemoryFileIndex(1 > paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_sales], > PartitionFilters: [], PushedFilters: [], ReadSchema: struct<ss_item_sk:int> > : +- BroadcastExchange > HashedRelationBroadcastMode(List(cast(input[0, int, false] as > bigint)),false), [id=#7227] > : +- Filter (isnotnull(sr_item_sk#452) AND > isnotnull(sr_reason_sk#458)) > : +- FileScan parquet [sr_item_sk#452,sr_reason_sk#458] > Batched: true, DataFilters: [isnotnull(sr_item_sk#452), > isnotnull(sr_reason_sk#458)], Format: Parquet, Location: InMemoryFileIndex(1 > paths)[file:/home/leonid/tpcds-spark-data-no-padding/store_returns], > PartitionFilters: [], PushedFilters: [IsNotNull(sr_item_sk), > IsNotNull(sr_reason_sk)], ReadSchema: struct<sr_item_sk:int,sr_reason_sk:int> > +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, > int, true] as bigint)),false), [id=#7231] > +- Project [r_reason_sk#734] > +- Filter ((isnotnull(r_reason_desc#736) AND (r_reason_desc#736 = > reason 38)) AND isnotnull(r_reason_sk#734)) > +- FileScan parquet [r_reason_sk#734,r_reason_desc#736] > Batched: true, DataFilters: [isnotnull(r_reason_desc#736), (r_reason_desc#736 > = reason 38), isnotnull(r_reason_sk#734)], Format: Parquet, Location: > InMemoryFileIndex(1 > paths)[file:/home/leonid/tpcds-spark-data-no-padding/reason], > PartitionFilters: [], PushedFilters: [IsNotNull(r_reason_desc), > EqualTo(r_reason_desc,reason 38), IsNotNull(r_reason_sk)], ReadSchema: > struct<r_reason_sk:int,r_reason_desc:string> > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org