[
https://issues.apache.org/jira/browse/SPARK-56977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56977:
-----------------------------------
Labels: pull-request-available (was: )
> Respect User Specified JoinType for NEAREST BY Join
> ---------------------------------------------------
>
> Key: SPARK-56977
> URL: https://issues.apache.org/jira/browse/SPARK-56977
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Zhidong Qu
> Priority: Major
> Labels: pull-request-available
>
> The original implementation of NEAREST BY JOIN hardcoded the synthetic join
> to `LeftOuter` and justified it on the grounds that `LEFT OUTER` and `INNER`
> are equivalent for an unconditioned join when the right side is non-empty,
> and `Generate(outer = false)` would drop unwanted rows for INNER when right
> is empty. While correct in result, this:
> * `INNER NEAREST BY` cannot be planned as a Cartesian product. Spark's
> strategy picks `CartesianProductExec` only for `Inner` joins with no
> condition; an unconditioned `LeftOuter` join falls back to
> `BroadcastNestedLoopJoin`, which tries to broadcast the right side. When the
> right relation is large, the broadcast either OOMs or exceeds
> `spark.sql.autoBroadcastJoinThreshold` and the planner is left with no good
> option. `CartesianProductExec` partitions both sides and streams pairs, so it
> scales naturally with right-side size. Respecting the user's `INNER` join
> type re-enables this strategy for the common `INNER NEAREST BY` case.
> * Makes the EXPLAIN output misleading for `INNER NEAREST BY` (shows
> `LeftOuter` in the underlying join even though the user wrote `INNER`).
> * Produces a row per left input when `right` is empty under `INNER`, only to
> drop those rows later via `Generate(outer = false)` and the `size(matches) >
> 0` filter -- extra work that respecting `joinType` avoids at the source.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]