[
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned SPARK-53322:
-------------------------------------
Assignee: Chirag Singh
> KeyGroupedPartitioning shouldn't be pushed down when one side of the plan
> doesn't have a scan
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-53322
> URL: https://issues.apache.org/jira/browse/SPARK-53322
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.4
> Reporter: Chirag Singh
> Assignee: Chirag Singh
> Priority: Major
> Labels: pull-request-available
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of
> that logical RDD remains a KeyGroupedPartitioning. However, JOIN keys cannot
> be pushed down to a non-scan node (as only scan nodes can reorganize input
> files within each task to ensure the KeyGroupedPartitioning matches for both
> sides of the JOIN). If only one side of the JOIN is checkpointed, this failed
> pushdown may cause a query failure (due to a partition count mismatch).
> However, if both sides of the JOIN are checkpointed, this failed pushdown may
> cause a correctness issue as each side may have the same number of
> partitions, but different partition values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]