[ https://issues.apache.org/jira/browse/SPARK-24390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24390. ---------------------------------- Resolution: Incomplete > confusion of columns in projection after WITH ROLLUP > ---------------------------------------------------- > > Key: SPARK-24390 > URL: https://issues.apache.org/jira/browse/SPARK-24390 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Environment: Databricks runtime 4.0 > Reporter: Ryan Foss > Priority: Minor > Labels: bulk-closed > > Using two CTEs, where the first issues a WITH ROLLUP and the second is a > projection of the first, when attempting to join the two CTEs, spark seems to > consider the key column in each CTE to be the same column, resulting in a > "Cannot resolve column" error. > > {noformat} > CREATE TABLE IF NOT EXISTS test_rollup (key varchar(3), code varchar(3), > stuff int); > EXPLAIN > WITH > cte1 AS ( > SELECT > key, > code, > struct(code, avg(stuff)) AS stuff > FROM test_rollup > GROUP BY key, code WITH ROLLUP > ), > cte2 AS ( > SELECT > key, > collect_list(stuff) AS stuff_details > FROM cte1 > WHERE code IS NOT NULL > GROUP BY key > ) > -- join summary record from cte1 to cte2 > SELECT c1.key, c1.stuff AS summary_stuff, c2.stuff_details AS detail_stuff > FROM cte1 c1 > JOIN cte2 c2 > ON c2.key = c1.key > WHERE c1.code IS NULL > == Physical Plan == org.apache.spark.sql.AnalysisException: cannot resolve > '`c2.key`' given input columns: [c1.key, c1.code, c1.stuff, > c2.stuff_details]; line 22 pos 5; 'Project ['c1.key, 'c1.stuff AS > summary_stuff#5415, 'c2.stuff_details AS detail_stuff#5416] +- 'Filter > isnull('c1.code) +- 'Join Inner, ('c2.key = 'c1.key) :- SubqueryAlias c1 : +- > SubqueryAlias cte1 : +- Aggregate [key#5429, code#5430, > spark_grouping_id#5426], [key#5429, code#5430, named_struct(code, code#5430, > col2, avg(cast(stuff#5424 as bigint))) AS stuff#5417] : +- Expand > [List(key#5422, code#5423, stuff#5424, key#5427, code#5428, 0), > List(key#5422, code#5423, stuff#5424, key#5427, null, 1), List(key#5422, > code#5423, stuff#5424, null, null, 3)], [key#5422, code#5423, stuff#5424, > key#5429, code#5430, spark_grouping_id#5426] : +- Project [key#5422, > code#5423, stuff#5424, key#5422 AS key#5427, code#5423 AS code#5428]{noformat} > Changing the cte2 query and adding a column alias "key AS key", will cause > the columns to be considered unique, resolving the join issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org