[ 
https://issues.apache.org/jira/browse/SPARK-53809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuchuan Huang updated SPARK-53809:
----------------------------------
    Description: 
Query optimization rules such as MergeScalarSubqueries check if two plans are 
identical by [comparing their canonicalized 
form|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L219].
 For DSv2, for physical plan, the canonicalization goes down in the child 
hierarchy to the BatchScanExec, which [has a doCanonicalize 
function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150];
 for logical plan, the canonicalization goes down to the 
DataSourceV2ScanRelation, which, however, does not have a doCanonicalize 
function. As a result, two logical plans who are semantically identical are not 
identified.

This PR proposes to add doCanonicalize function for DataSourceV2ScanRelation. 
The implementation is similar to [the one implemented in 
BatchScanExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150],
 because they are both the leafNodes of DSv2 logicalPlan and physicalPlan, 
respectively.

  was:
Query optimization rules such as MergeScalarSubqueries check if two plans are 
identical by comparing their canonicalized form. For DSv2, for physical plan, 
the canonicalization goes down in the child hierarchy to the BatchScanExec, 
which has a doCanonicalize function; for logical plan, the canonicalization 
goes down to the DataSourceV2ScanRelation, which, however, does not have a 
doCanonicalize function. As a result, two logical plans who are semantically 
identical are not identified.

This PR proposes to add doCanonicalize function for DataSourceV2ScanRelation. 
The implementation is similar to the one implemented in BatchScanExec, because 
they are both the leafNodes of DSv2 logicalPlan and physicalPlan, respectively.

 


> Add canonicalization for dsv2 scan
> ----------------------------------
>
>                 Key: SPARK-53809
>                 URL: https://issues.apache.org/jira/browse/SPARK-53809
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Yuchuan Huang
>            Priority: Major
>              Labels: pull-request-available
>
> Query optimization rules such as MergeScalarSubqueries check if two plans are 
> identical by [comparing their canonicalized 
> form|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L219].
>  For DSv2, for physical plan, the canonicalization goes down in the child 
> hierarchy to the BatchScanExec, which [has a doCanonicalize 
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150];
>  for logical plan, the canonicalization goes down to the 
> DataSourceV2ScanRelation, which, however, does not have a doCanonicalize 
> function. As a result, two logical plans who are semantically identical are 
> not identified.
> This PR proposes to add doCanonicalize function for DataSourceV2ScanRelation. 
> The implementation is similar to [the one implemented in 
> BatchScanExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150],
>  because they are both the leafNodes of DSv2 logicalPlan and physicalPlan, 
> respectively.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to