[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-32237. --------------------------------- Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29201 [https://github.com/apache/spark/pull/29201] > Cannot resolve column when put hint in the views of common table expression > --------------------------------------------------------------------------- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 > Reporter: Kernel Force > Assignee: Lantao Jin > Priority: Major > Fix For: 3.0.1, 3.1.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:10000> SELECT T.* FROM DEMO_DATA T; > +-------+---------+-------------+---------+ > | t.id | t.name | t.batch | t.team | > +-------+---------+-------------+---------+ > | 1 | mike | 2020-07-08 | A | > | 2 | john | 2020-07-07 | B | > | 3 | rose | 2020-07-06 | B | > | .... | > +-------+---------+-------------+---------+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T > +- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data > +- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:92) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:177) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:92) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:89) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:130) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:156) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:133) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:133) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) > ... 56 elided > {code} > I think the analysis procedure should not be disturbed even though there was > any hint could not be recognized. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org