[
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-53322.
-----------------------------------
Fix Version/s: 4.1.0
Resolution: Fixed
Issue resolved by pull request 53132
[https://github.com/apache/spark/pull/53132]
> KeyGroupedPartitioning shouldn't be pushed down when one side of the plan
> doesn't have a scan
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-53322
> URL: https://issues.apache.org/jira/browse/SPARK-53322
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.4
> Reporter: Chirag Singh
> Assignee: Chirag Singh
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of
> that logical RDD remains a KeyGroupedPartitioning. However, JOIN keys cannot
> be pushed down to a non-scan node (as only scan nodes can reorganize input
> files within each task to ensure the KeyGroupedPartitioning matches for both
> sides of the JOIN). If only one side of the JOIN is checkpointed, this failed
> pushdown may cause a query failure (due to a partition count mismatch).
> However, if both sides of the JOIN are checkpointed, this failed pushdown may
> cause a correctness issue as each side may have the same number of
> partitions, but different partition values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]