[ 
https://issues.apache.org/jira/browse/SPARK-41557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shardul Mahadik updated SPARK-41557:
------------------------------------
    Description: 
Here is a test case that can be added to {{MetadataColumnSuite}} to demonstrate 
the issue
{code:scala}
  test("SPARK-XXXXX: Union of tables with and without metadata column should 
work") {
    withTable(tbl) {
      sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
      checkAnswer(
        spark.sql(
          s"""
            SELECT b.*
            FROM RANGE(1)
              LEFT JOIN (
                SELECT id FROM $tbl
                UNION ALL
                SELECT id FROM RANGE(10)
              ) b USING(id)
          """),
        Seq(Row(0))
      )
    }
  }
 {code}

Here a table with metadata columns {{$tbl}} is unioned with a table without 
metdata columns {{RANGE(10)}}. If this result is later used in a join, query 
analysis fails saying mismatch in the number of columns of the union caused by 
the metadata columns. However, here we can see that we explicitly project only 
one column during the union, so the union should be valid.

{code}
org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only 
be performed on inputs with the same number of columns, but the first input has 
3 columns and the second input has 1 columns.; line 5 pos 16;
'Project [id#26L]
+- 'Project [id#26L, id#26L]
   +- 'Project [id#28L, id#26L]
      +- 'Join LeftOuter, (id#28L = id#26L)
         :- Range (0, 1, step=1, splits=None)
         +- 'SubqueryAlias b
            +- 'Union false, false
               :- Project [id#26L, index#30, _partition#31]
               :  +- SubqueryAlias testcat.t
               :     +- RelationV2[id#26L, data#27, index#30, _partition#31] 
testcat.t testcat.t
               +- Project [id#29L]
                  +- Range (0, 10, step=1, splits=None)
{code}

  was:
Here is a test case that can be added to {{MetadataColumnSuite}} to demonstrate 
the issue
{code:scala}
    test("SPARK-XXXXX: Union of tables with and without metadata column should 
work") {
    withTable(tbl) {
      sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
      checkAnswer(
        spark.sql(
          s"""
            SELECT b.*
            FROM RANGE(1)
              LEFT JOIN (
                SELECT id FROM $tbl
                UNION ALL
                SELECT id FROM RANGE(10)
              ) b USING(id)
          """),
        Seq(Row(0))
      )
    }
  }
 {code}

Here a table with metadata columns {{$tbl}} is unioned with a table without 
metdata columns {{RANGE(10)}}. If this result is later used in a join, query 
analysis fails saying mismatch in the number of columns of the union caused by 
the metadata columns. However, here we can see that we explicitly project only 
one column during the union, so the union should be valid.

{code}
org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only 
be performed on inputs with the same number of columns, but the first input has 
3 columns and the second input has 1 columns.; line 5 pos 16;
'Project [id#26L]
+- 'Project [id#26L, id#26L]
   +- 'Project [id#28L, id#26L]
      +- 'Join LeftOuter, (id#28L = id#26L)
         :- Range (0, 1, step=1, splits=None)
         +- 'SubqueryAlias b
            +- 'Union false, false
               :- Project [id#26L, index#30, _partition#31]
               :  +- SubqueryAlias testcat.t
               :     +- RelationV2[id#26L, data#27, index#30, _partition#31] 
testcat.t testcat.t
               +- Project [id#29L]
                  +- Range (0, 10, step=1, splits=None)
{code}


> Union of tables with and without metadata column fails when used in join
> ------------------------------------------------------------------------
>
>                 Key: SPARK-41557
>                 URL: https://issues.apache.org/jira/browse/SPARK-41557
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.2, 3.4.0
>            Reporter: Shardul Mahadik
>            Priority: Major
>
> Here is a test case that can be added to {{MetadataColumnSuite}} to 
> demonstrate the issue
> {code:scala}
>   test("SPARK-XXXXX: Union of tables with and without metadata column should 
> work") {
>     withTable(tbl) {
>       sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
>       checkAnswer(
>         spark.sql(
>           s"""
>             SELECT b.*
>             FROM RANGE(1)
>               LEFT JOIN (
>                 SELECT id FROM $tbl
>                 UNION ALL
>                 SELECT id FROM RANGE(10)
>               ) b USING(id)
>           """),
>         Seq(Row(0))
>       )
>     }
>   }
>  {code}
> Here a table with metadata columns {{$tbl}} is unioned with a table without 
> metdata columns {{RANGE(10)}}. If this result is later used in a join, query 
> analysis fails saying mismatch in the number of columns of the union caused 
> by the metadata columns. However, here we can see that we explicitly project 
> only one column during the union, so the union should be valid.
> {code}
> org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only 
> be performed on inputs with the same number of columns, but the first input 
> has 3 columns and the second input has 1 columns.; line 5 pos 16;
> 'Project [id#26L]
> +- 'Project [id#26L, id#26L]
>    +- 'Project [id#28L, id#26L]
>       +- 'Join LeftOuter, (id#28L = id#26L)
>          :- Range (0, 1, step=1, splits=None)
>          +- 'SubqueryAlias b
>             +- 'Union false, false
>                :- Project [id#26L, index#30, _partition#31]
>                :  +- SubqueryAlias testcat.t
>                :     +- RelationV2[id#26L, data#27, index#30, _partition#31] 
> testcat.t testcat.t
>                +- Project [id#29L]
>                   +- Range (0, 10, step=1, splits=None)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to