[ 
https://issues.apache.org/jira/browse/SPARK-24390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24390.
----------------------------------
    Resolution: Incomplete

> confusion of columns in projection after WITH ROLLUP
> ----------------------------------------------------
>
>                 Key: SPARK-24390
>                 URL: https://issues.apache.org/jira/browse/SPARK-24390
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: Databricks runtime 4.0
>            Reporter: Ryan Foss
>            Priority: Minor
>              Labels: bulk-closed
>
> Using two CTEs, where the first issues a WITH ROLLUP and the second is a 
> projection of the first, when attempting to join the two CTEs, spark seems to 
> consider the key column in each CTE to be the same column, resulting in a 
> "Cannot resolve column" error.
>  
> {noformat}
> CREATE TABLE IF NOT EXISTS test_rollup (key varchar(3), code varchar(3), 
> stuff int);
> EXPLAIN
> WITH
> cte1 AS (
>   SELECT
>     key,
>     code,
>     struct(code, avg(stuff)) AS stuff
>   FROM test_rollup
>   GROUP BY key, code WITH ROLLUP
> ),
> cte2 AS (
>   SELECT
>     key,
>     collect_list(stuff) AS stuff_details
>   FROM cte1
>   WHERE code IS NOT NULL
>   GROUP BY key
> )
> -- join summary record from cte1 to cte2
> SELECT c1.key, c1.stuff AS summary_stuff, c2.stuff_details AS detail_stuff
> FROM cte1 c1
> JOIN cte2 c2
>   ON c2.key = c1.key
> WHERE c1.code IS NULL
> == Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve 
> '`c2.key`' given input columns: [c1.key, c1.code, c1.stuff, 
> c2.stuff_details]; line 22 pos 5; 'Project ['c1.key, 'c1.stuff AS 
> summary_stuff#5415, 'c2.stuff_details AS detail_stuff#5416] +- 'Filter 
> isnull('c1.code) +- 'Join Inner, ('c2.key = 'c1.key) :- SubqueryAlias c1 : +- 
> SubqueryAlias cte1 : +- Aggregate [key#5429, code#5430, 
> spark_grouping_id#5426], [key#5429, code#5430, named_struct(code, code#5430, 
> col2, avg(cast(stuff#5424 as bigint))) AS stuff#5417] : +- Expand 
> [List(key#5422, code#5423, stuff#5424, key#5427, code#5428, 0), 
> List(key#5422, code#5423, stuff#5424, key#5427, null, 1), List(key#5422, 
> code#5423, stuff#5424, null, null, 3)], [key#5422, code#5423, stuff#5424, 
> key#5429, code#5430, spark_grouping_id#5426] : +- Project [key#5422, 
> code#5423, stuff#5424, key#5422 AS key#5427, code#5423 AS code#5428]{noformat}
> Changing the cte2 query and adding a column alias "key AS key", will cause 
> the columns to be considered unique, resolving the join issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to