Shardul Mahadik created SPARK-41557:
---------------------------------------

             Summary: Union of tables with and without metadata column fails 
when used in join
                 Key: SPARK-41557
                 URL: https://issues.apache.org/jira/browse/SPARK-41557
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.2, 3.4.0
            Reporter: Shardul Mahadik


Here is a test case that can be added to {{MetadataColumnSuite}} to demonstrate 
the issue
{code:scala}
    test("SPARK-XXXXX: Union of tables with and without metadata column should 
work") {
    withTable(tbl) {
      sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
      checkAnswer(
        spark.sql(
          s"""
            SELECT b.*
            FROM RANGE(1)
              LEFT JOIN (
                SELECT id FROM $tbl
                UNION ALL
                SELECT id FROM RANGE(10)
              ) b USING(id)
          """),
        Seq(Row(0))
      )
    }
  }
 {code}

Here a table with metadata columns {{$tbl}} is unioned with a table without 
metdata columns {{RANGE(10)}}. If this result is later used in a join, query 
analysis fails saying mismatch in the number of columns of the union caused by 
the metadata columns. However, here we can see that we explicitly project only 
one column during the union, so the union should be valid.

{code}
org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only 
be performed on inputs with the same number of columns, but the first input has 
3 columns and the second input has 1 columns.; line 5 pos 16;
'Project [id#26L]
+- 'Project [id#26L, id#26L]
   +- 'Project [id#28L, id#26L]
      +- 'Join LeftOuter, (id#28L = id#26L)
         :- Range (0, 1, step=1, splits=None)
         +- 'SubqueryAlias b
            +- 'Union false, false
               :- Project [id#26L, index#30, _partition#31]
               :  +- SubqueryAlias testcat.t
               :     +- RelationV2[id#26L, data#27, index#30, _partition#31] 
testcat.t testcat.t
               +- Project [id#29L]
                  +- Range (0, 10, step=1, splits=None)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to