[ https://issues.apache.org/jira/browse/SPARK-41557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648866#comment-17648866 ]
Shardul Mahadik commented on SPARK-41557: ----------------------------------------- cc: [~Gengliang.Wang] [~cloud_fan] > Union of tables with and without metadata column fails when used in join > ------------------------------------------------------------------------ > > Key: SPARK-41557 > URL: https://issues.apache.org/jira/browse/SPARK-41557 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.2, 3.4.0 > Reporter: Shardul Mahadik > Priority: Major > > Here is a test case that can be added to {{MetadataColumnSuite}} to > demonstrate the issue > {code:scala} > test("SPARK-41557: Union of tables with and without metadata column should > work") { > withTable(tbl) { > sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)") > checkAnswer( > spark.sql( > s""" > SELECT b.* > FROM RANGE(1) > LEFT JOIN ( > SELECT id FROM $tbl > UNION ALL > SELECT id FROM RANGE(10) > ) b USING(id) > """), > Seq(Row(0)) > ) > } > } > {code} > Here a table with metadata columns {{$tbl}} is unioned with a table without > metdata columns {{RANGE(10)}}. If this result is later used in a join, query > analysis fails saying mismatch in the number of columns of the union caused > by the metadata columns. However, here we can see that we explicitly project > only one column during the union, so the union should be valid. > {code} > org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only > be performed on inputs with the same number of columns, but the first input > has 3 columns and the second input has 1 columns.; line 5 pos 16; > 'Project [id#26L] > +- 'Project [id#26L, id#26L] > +- 'Project [id#28L, id#26L] > +- 'Join LeftOuter, (id#28L = id#26L) > :- Range (0, 1, step=1, splits=None) > +- 'SubqueryAlias b > +- 'Union false, false > :- Project [id#26L, index#30, _partition#31] > : +- SubqueryAlias testcat.t > : +- RelationV2[id#26L, data#27, index#30, _partition#31] > testcat.t testcat.t > +- Project [id#29L] > +- Range (0, 10, step=1, splits=None) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org