[
https://issues.apache.org/jira/browse/SPARK-46946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-46946.
---------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 44988
[https://github.com/apache/spark/pull/44988]
> Supporting broadcast of multiple filtering keys in DynamicPruning
> -----------------------------------------------------------------
>
> Key: SPARK-46946
> URL: https://issues.apache.org/jira/browse/SPARK-46946
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.0, 3.5.1
> Reporter: Thang Long Vu
> Assignee: Thang Long Vu
> Priority: Major
> Labels: pull-request-available, releasenotes
> Fix For: 4.0.0
>
>
> This PR extends `DynamicPruningSubquery` to support broadcasting of multiple
> filtering keys (instead of one as before). The majority of the PR is to
> simply generalise singularity to plurality.
> Note: We actually do not use the multiple filtering keys
> `DynamicPruningSubquery` in this PR, we are doing this to make supporting DPP
> Null Safe Equality or multiple Equality predicates easier in the future.
> In Null Safe Equality JOIN, the JOIN condition `a <=> b` is transformed to
> `Coalesce(key1, Literal(key1.dataType)) = Coalesce(key2,
> Literal(key2.dataType)) AND IsNull(key1) = IsNull(key2)`. In order to have
> the highest pruning efficiency, we broadcast the 2 keys `Coalesce(key,
> Literal(key.dataType))` and `IsNull(key)` and use them to prune the other
> side at the same time.
> Before, the `DynamicPruningSubquery` only has one broadcasting key and we
> only supports DPP for one `EqualTo` JOIN predicate, now we are extending the
> subquery to multiple broadcasting keys. Please note that DPP has not been
> supported for multiple JOIN predicates.
> Put it in another way, at the moment, we don't insert a DPP Filter for
> multiple JOIN predicates at the same time, only potentially insert a DPP
> Filter for a given Equality JOIN predicate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]