[jira] [Updated] (SPARK-53809) Add canonicalization for dsv2 scan

Yuchuan Huang (Jira) Mon, 06 Oct 2025 12:46:46 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuchuan Huang updated SPARK-53809:
----------------------------------
    Description: 
Query optimization rules such as MergeScalarSubqueries check if two plans are 
identical by [comparing their canonicalized 
form|[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L219]].
 For DSv2, the comparison goes down to DataSourceV2ScanRelation in the 
hierarchy, which currently lacks canonicalize function. 

 

This ticket aims to add doCanonicalize function for DataSourceV2ScanRelation, 
as well as the Scan interface. The reason is that two identical scan may have 
predicates in different order during QO rewrite. As a reference, [FileScan 
normalize filters before checking 
equality|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala]].
 

  was:Query optimization rules such as MergeScalarSubqueries check 


> Add canonicalization for dsv2 scan
> ----------------------------------
>
>                 Key: SPARK-53809
>                 URL: https://issues.apache.org/jira/browse/SPARK-53809
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Yuchuan Huang
>            Priority: Major
>
> Query optimization rules such as MergeScalarSubqueries check if two plans are 
> identical by [comparing their canonicalized 
> form|[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L219]].
>  For DSv2, the comparison goes down to DataSourceV2ScanRelation in the 
> hierarchy, which currently lacks canonicalize function. 
>  
> This ticket aims to add doCanonicalize function for DataSourceV2ScanRelation, 
> as well as the Scan interface. The reason is that two identical scan may have 
> predicates in different order during QO rewrite. As a reference, [FileScan 
> normalize filters before checking 
> equality|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala]].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-53809) Add canonicalization for dsv2 scan

Reply via email to