[ 
https://issues.apache.org/jira/browse/SPARK-52462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Aleksic updated SPARK-52462:
------------------------------------
    Description: 
Right now, query the following query produces plans that are not consistent 
over different underlying table providers. Query:

SELECT col1, col2, col3, NULLIF('','') AS col4
FROM table
UNION ALL
SELECT col2, col2, null AS col3, col4
FROM table;

This happens because of rule ordering:
 - Sometimes: ... -> WidenSetOperationTypes -> ... -> ResolveReferences 
(deduplication of Union children outputs) -> ...
 - Sometimes: ... -> ResolveReferences (deduplication of Union children 
outputs) -> ... -> WidenSetOperationTypes -> ...

In this issue I propose that we align those two by enforcing type coercion to 
happen before deduplication.

  was:
Right now, query the following query produces different plans between delta and 
non-delta underlying tables. Query:

SELECT col1, col2, col3, NULLIF('','') AS col4
FROM table
UNION ALL
SELECT col2, col2, null AS col3, col4
FROM table;

This happens because of rule ordering:
 - Using Delta table: ... -> WidenSetOperationTypes -> ... -> ResolveReferences 
(deduplication of Union children outputs) -> ...
 - Using non-Delta table: ... -> ResolveReferences (deduplication of Union 
children outputs) -> ... -> WidenSetOperationTypes -> ...

In this issue I propose that we align those two by enforcing type coercion to 
happen before deduplication.


> Enforce type coercion before children output deduplication in Union
> -------------------------------------------------------------------
>
>                 Key: SPARK-52462
>                 URL: https://issues.apache.org/jira/browse/SPARK-52462
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Mihailo Aleksic
>            Priority: Major
>
> Right now, query the following query produces plans that are not consistent 
> over different underlying table providers. Query:
> SELECT col1, col2, col3, NULLIF('','') AS col4
> FROM table
> UNION ALL
> SELECT col2, col2, null AS col3, col4
> FROM table;
> This happens because of rule ordering:
>  - Sometimes: ... -> WidenSetOperationTypes -> ... -> ResolveReferences 
> (deduplication of Union children outputs) -> ...
>  - Sometimes: ... -> ResolveReferences (deduplication of Union children 
> outputs) -> ... -> WidenSetOperationTypes -> ...
> In this issue I propose that we align those two by enforcing type coercion to 
> happen before deduplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to