Gengliang Wang created SPARK-41538:
--------------------------------------

             Summary: Metadata column should be appended at the end of project 
list
                 Key: SPARK-41538
                 URL: https://issues.apache.org/jira/browse/SPARK-41538
             Project: Spark
          Issue Type: Task
          Components: SQL
    Affects Versions: 3.3.2, 3.4.0
            Reporter: Gengliang Wang
            Assignee: Gengliang Wang


For the following query:

 
{code:java}
CREATE TABLE table_1 (
  a ARRAY<STRING>,
 s STRUCT<id: STRING>)
USING parquet;

CREATE VIEW view_1 (id)
AS WITH source AS (
    SELECT * FROM table_1
),
renamed AS (
    SELECT
     s.id
    FROM source
)
SELECT id FROM renamed;

with foo AS (
  SELECT 'a' as id
),
bar AS (
  SELECT 'a' as id
)
SELECT
  1
FROM foo
FULL OUTER JOIN bar USING(id)
FULL OUTER JOIN view_1 USING(id)
WHERE foo.id IS NOT NULL{code}
There will be the following error:

 
{code:java}
class org.apache.spark.sql.types.ArrayType cannot be cast to class 
org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and 
org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType cannot 
be cast to class org.apache.spark.sql.types.StructType 
(org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType 
are in unnamed module of loader 'app')
    at 
org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:108)
    at 
org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:108)
    at 
org.apache.spark.sql.catalyst.expressions.GetStructField.dataType(complexTypeExtractors.scala:114)
    at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:193)
    at 
org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
    at 
org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
    at scala.collection.immutable.List.collect(List.scala:315)
    at 
org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap(AliasHelper.scala:50)
    at 
org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap$(AliasHelper.scala:47)
    at 
org.apache.spark.sql.catalyst.optimizer.CollapseProject$.getAliasMap(Optimizer.scala:992)
    at 
org.apache.spark.sql.catalyst.optimizer.CollapseProject$.canCollapseExpressions(Optimizer.scala:1029){code}
This is caused by the inconsistent metadata column positions in the following 
two nodes:
 * Table relation: at the ending position
 * Project list: at the beginning position

When the InlineCTE rule executes, the metadata column in project is wrongly 
combined with the table output.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to