[ 
https://issues.apache.org/jira/browse/SPARK-41557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648866#comment-17648866
 ] 

Shardul Mahadik commented on SPARK-41557:
-----------------------------------------

cc: [~Gengliang.Wang] [~cloud_fan]


> Union of tables with and without metadata column fails when used in join
> ------------------------------------------------------------------------
>
>                 Key: SPARK-41557
>                 URL: https://issues.apache.org/jira/browse/SPARK-41557
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.2, 3.4.0
>            Reporter: Shardul Mahadik
>            Priority: Major
>
> Here is a test case that can be added to {{MetadataColumnSuite}} to 
> demonstrate the issue
> {code:scala}
>   test("SPARK-41557: Union of tables with and without metadata column should 
> work") {
>     withTable(tbl) {
>       sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
>       checkAnswer(
>         spark.sql(
>           s"""
>             SELECT b.*
>             FROM RANGE(1)
>               LEFT JOIN (
>                 SELECT id FROM $tbl
>                 UNION ALL
>                 SELECT id FROM RANGE(10)
>               ) b USING(id)
>           """),
>         Seq(Row(0))
>       )
>     }
>   }
>  {code}
> Here a table with metadata columns {{$tbl}} is unioned with a table without 
> metdata columns {{RANGE(10)}}. If this result is later used in a join, query 
> analysis fails saying mismatch in the number of columns of the union caused 
> by the metadata columns. However, here we can see that we explicitly project 
> only one column during the union, so the union should be valid.
> {code}
> org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only 
> be performed on inputs with the same number of columns, but the first input 
> has 3 columns and the second input has 1 columns.; line 5 pos 16;
> 'Project [id#26L]
> +- 'Project [id#26L, id#26L]
>    +- 'Project [id#28L, id#26L]
>       +- 'Join LeftOuter, (id#28L = id#26L)
>          :- Range (0, 1, step=1, splits=None)
>          +- 'SubqueryAlias b
>             +- 'Union false, false
>                :- Project [id#26L, index#30, _partition#31]
>                :  +- SubqueryAlias testcat.t
>                :     +- RelationV2[id#26L, data#27, index#30, _partition#31] 
> testcat.t testcat.t
>                +- Project [id#29L]
>                   +- Range (0, 10, step=1, splits=None)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to