[ 
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-53322.
-----------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

Issue resolved by pull request 53132
[https://github.com/apache/spark/pull/53132]

> KeyGroupedPartitioning shouldn't be pushed down when one side of the plan 
> doesn't have a scan
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53322
>                 URL: https://issues.apache.org/jira/browse/SPARK-53322
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.4
>            Reporter: Chirag Singh
>            Assignee: Chirag Singh
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of 
> that logical RDD remains a KeyGroupedPartitioning. However, JOIN keys cannot 
> be pushed down to a non-scan node (as only scan nodes can reorganize input 
> files within each task to ensure the KeyGroupedPartitioning matches for both 
> sides of the JOIN). If only one side of the JOIN is checkpointed, this failed 
> pushdown may cause a query failure (due to a partition count mismatch). 
> However, if both sides of the JOIN are checkpointed, this failed pushdown may 
> cause a correctness issue as each side may have the same number of 
> partitions, but different partition values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to