[
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chirag Singh updated SPARK-53322:
---------------------------------
Summary: KeyGroupedPartitioning shouldn't be pushed down when one side of
the plan doesn't have a scan (was: KeyGroupedPartitioning shouldn't be pushed
down when one side of SPJ doesn't have a scan)
> KeyGroupedPartitioning shouldn't be pushed down when one side of the plan
> doesn't have a scan
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-53322
> URL: https://issues.apache.org/jira/browse/SPARK-53322
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.4
> Reporter: Chirag Singh
> Priority: Major
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of
> that logical RDD remains a KeyGroupedPartitioning. However, SPJ parameters
> cannot be pushed down to a non-scan node (as only scan nodes can reorganize
> input files within each task to ensure the KeyGroupedPartitioning matches for
> both sides of the JOIN). If only one side of the JOIN is checkpointed, this
> failed pushdown may cause a query failure (due to a partition count
> mismatch). However, if both sides of the JOIN are checkpointed, this failed
> pushdown may cause a correctness issue as each side may have the same number
> of partitions, but different partition values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]