[ 
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chirag Singh reopened SPARK-53322:
----------------------------------

> KeyGroupedPartitioning shouldn't be pushed down when one side of SPJ doesn't 
> have a scan
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-53322
>                 URL: https://issues.apache.org/jira/browse/SPARK-53322
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.4
>            Reporter: Chirag Singh
>            Priority: Major
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of 
> that logical RDD remains a KeyGroupedPartitioning. However, SPJ parameters 
> cannot be pushed down to a non-scan node (as only scan nodes can reorganize 
> input files within each task to ensure the KeyGroupedPartitioning matches for 
> both sides of the JOIN). If only one side of the JOIN is checkpointed, this 
> failed pushdown may cause a query failure (due to a partition count 
> mismatch). However, if both sides of the JOIN are checkpointed, this failed 
> pushdown may cause a correctness issue as each side may have the same number 
> of partitions, but different partition values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to