[ https://issues.apache.org/jira/browse/SPARK-52462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mihailo Aleksic updated SPARK-52462: ------------------------------------ Description: Right now, query the following query produces plans that are not consistent over different underlying table providers. Query: SELECT col1, col2, col3, NULLIF('','') AS col4 FROM table UNION ALL SELECT col2, col2, null AS col3, col4 FROM table; This happens because of rule ordering: - Sometimes: ... -> WidenSetOperationTypes -> ... -> ResolveReferences (deduplication of Union children outputs) -> ... - Sometimes: ... -> ResolveReferences (deduplication of Union children outputs) -> ... -> WidenSetOperationTypes -> ... In this issue I propose that we align those two by enforcing type coercion to happen before deduplication. was: Right now, query the following query produces different plans between delta and non-delta underlying tables. Query: SELECT col1, col2, col3, NULLIF('','') AS col4 FROM table UNION ALL SELECT col2, col2, null AS col3, col4 FROM table; This happens because of rule ordering: - Using Delta table: ... -> WidenSetOperationTypes -> ... -> ResolveReferences (deduplication of Union children outputs) -> ... - Using non-Delta table: ... -> ResolveReferences (deduplication of Union children outputs) -> ... -> WidenSetOperationTypes -> ... In this issue I propose that we align those two by enforcing type coercion to happen before deduplication. > Enforce type coercion before children output deduplication in Union > ------------------------------------------------------------------- > > Key: SPARK-52462 > URL: https://issues.apache.org/jira/browse/SPARK-52462 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.1.0 > Reporter: Mihailo Aleksic > Priority: Major > > Right now, query the following query produces plans that are not consistent > over different underlying table providers. Query: > SELECT col1, col2, col3, NULLIF('','') AS col4 > FROM table > UNION ALL > SELECT col2, col2, null AS col3, col4 > FROM table; > This happens because of rule ordering: > - Sometimes: ... -> WidenSetOperationTypes -> ... -> ResolveReferences > (deduplication of Union children outputs) -> ... > - Sometimes: ... -> ResolveReferences (deduplication of Union children > outputs) -> ... -> WidenSetOperationTypes -> ... > In this issue I propose that we align those two by enforcing type coercion to > happen before deduplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org