[
https://issues.apache.org/jira/browse/SPARK-52261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043519#comment-18043519
]
André Souprayane commented on SPARK-52261:
------------------------------------------
The first sql query generates the following logical plan:
'WithCTE
:- 'CTERelationDef 2, false =>
: +- 'SubqueryAlias bar
: +- 'Project [*]
: +- 'SubqueryAlias foo
: +- 'CTERelationRef 1, false, false, false, 6, false
+- 'Project [*]
+- 'SubqueryAlias foo
+- 'SubqueryAlias foo
+- 'Union false, false
:- SubqueryAlias a
: +- LocalRelation [str#4, num#5]
+- SubqueryAlias b
+- LocalRelation [str#6]
The second sql query generates the following logical plan:
'Project [*]
+- 'SubqueryAlias foo
+- 'SubqueryAlias foo
+- 'Union false, false
:- SubqueryAlias a
: +- LocalRelation [str#1, num#2]
+- SubqueryAlias b
+- LocalRelation [str#3]
When there are two CTE subquery, the second CTE subquery is included in a child
plan whereas the first CTE subquery is included as a subqueryAlias.
In this case, the second CTE query has an unresolved star expression, that is
why the CheckAnalysis fails when it parses the first child plan.
I don't find how we can only update CheckAnalysis class to avoid failing on the
unresolved star expression and fail instead on the root cause which is the
number of columns mismatch.
> Misleading error: Invalid usage of '*'
> --------------------------------------
>
> Key: SPARK-52261
> URL: https://issues.apache.org/jira/browse/SPARK-52261
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Max Gekk
> Priority: Major
>
> The code below raises the misleading error:
> {code:java}
> [INVALID_USAGE_OF_STAR_OR_REGEX] Invalid usage of '*' in Project. SQLSTATE:
> 42000; line 7 pos 9;
> {code}
> {code:sql}
> with foo as (
> values ("one", 1), ("two", 2), ("three", 3) as a (str, num)
> union all
> values ("four"), ("five"), ("six") as b (str)
> ),
> bar as (
> select * from foo
> )
> select * from foo
> {code}
> The error is not caused by '*' usage, and should be similar to:
> {code:sql}
> with foo as (
> values ("one", 1), ("two", 2), ("three", 3) as a (str, num)
> union all
> values ("four"), ("five"), ("six") as b (str)
> )
> select * from foo
> {code}
> {code:java}
> [NUM_COLUMNS_MISMATCH] UNION can only be performed on inputs with the same
> number of columns, but the first input has 2 columns and the second input has
> 1 columns. SQLSTATE: 42826; line 2 pos 2;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]