[ 
https://issues.apache.org/jira/browse/SPARK-53322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chirag Singh updated SPARK-53322:
---------------------------------
    Description: When checkpointing an RDD that scans a V2 table, the output 
partitioning of that logical RDD remains a KeyGroupedPartitioning. However, 
JOIN keys cannot be pushed down to a non-scan node (as only scan nodes can 
reorganize input files within each task to ensure the KeyGroupedPartitioning 
matches for both sides of the JOIN). If only one side of the JOIN is 
checkpointed, this failed pushdown may cause a query failure (due to a 
partition count mismatch). However, if both sides of the JOIN are checkpointed, 
this failed pushdown may cause a correctness issue as each side may have the 
same number of partitions, but different partition values.  (was: When 
checkpointing an RDD that scans a V2 table, the output partitioning of that 
logical RDD remains a KeyGroupedPartitioning. However, SPJ parameters cannot be 
pushed down to a non-scan node (as only scan nodes can reorganize input files 
within each task to ensure the KeyGroupedPartitioning matches for both sides of 
the JOIN). If only one side of the JOIN is checkpointed, this failed pushdown 
may cause a query failure (due to a partition count mismatch). However, if both 
sides of the JOIN are checkpointed, this failed pushdown may cause a 
correctness issue as each side may have the same number of partitions, but 
different partition values.)

> KeyGroupedPartitioning shouldn't be pushed down when one side of the plan 
> doesn't have a scan
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53322
>                 URL: https://issues.apache.org/jira/browse/SPARK-53322
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.4
>            Reporter: Chirag Singh
>            Priority: Major
>
> When checkpointing an RDD that scans a V2 table, the output partitioning of 
> that logical RDD remains a KeyGroupedPartitioning. However, JOIN keys cannot 
> be pushed down to a non-scan node (as only scan nodes can reorganize input 
> files within each task to ensure the KeyGroupedPartitioning matches for both 
> sides of the JOIN). If only one side of the JOIN is checkpointed, this failed 
> pushdown may cause a query failure (due to a partition count mismatch). 
> However, if both sides of the JOIN are checkpointed, this failed pushdown may 
> cause a correctness issue as each side may have the same number of 
> partitions, but different partition values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to