[ https://issues.apache.org/jira/browse/SPARK-28195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liupengcheng updated SPARK-28195: --------------------------------- Description: Currently, we encountered an issue when executing `InsertIntoDataSourceDirCommand`, and we found that it's query relied on non-exist table or view, but we finally got a misleading error message: {code:java} Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'kr.objective_id at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159) at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544) at org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277) ... 11 more {code} After looking into the code, I found that it's because we support `runSQLOnFiles` feature since 2.3, and if the table does not exist and it's not a temporary table, then It will be treated as running directly on files. `ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` on resolve failure(it's actually not a sql on files, so it will fail when resolving). Due to Command has empty children, `CheckAnalysis` will skip check the `UnresolvedRelation` and finally we got the above misleading error message when executing this command. I think maybe we should checkAnalysis for command's query plan? Or is there any consideration for not checking analysis for command? Seems this issue still exists in master branch. was: Currently, we encountered an issue when executing `InsertIntoDataSourceDirCommand`, and we found that it's query relied on non-exist table or view, but we finally got a misleading error message: {code:java} Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'kr.objective_id at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159) at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544) at org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277) ... 11 more {code} After looking into the code, I found that it's because we support `runSQLOnFiles` feature since 2.3, and if the table does not exist and it's not a temporary table, then It will be treated as running directly on files. `ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` on resolve failure(it's actually not a sql on files, so it will fail when resolving). Due to Command has empty children, `CheckAnalysis` will skip check the `UnresolvedRelation` and finally we got the above misleading error message when executing this command. I think maybe we should checkAnalysis for command's query plan? Or is there any consideration for not checking analysis for command? > CheckAnalysis not working for Command and report misleading error message > ------------------------------------------------------------------------- > > Key: SPARK-28195 > URL: https://issues.apache.org/jira/browse/SPARK-28195 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.2 > Reporter: liupengcheng > Priority: Major > > Currently, we encountered an issue when executing > `InsertIntoDataSourceDirCommand`, and we found that it's query relied on > non-exist table or view, but we finally got a misleading error message: > {code:java} > Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: > Invalid call to dataType on unresolved object, tree: 'kr.objective_id > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) > at > org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) > at > org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159) > at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159) > at > org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544) > at > org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at > org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) > at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277) > ... 11 more > {code} > After looking into the code, I found that it's because we support > `runSQLOnFiles` feature since 2.3, and if the table does not exist and it's > not a temporary table, then It will be treated as running directly on files. > `ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` > on resolve failure(it's actually not a sql on files, so it will fail when > resolving). Due to Command has empty children, `CheckAnalysis` will skip > check the `UnresolvedRelation` and finally we got the above misleading error > message when executing this command. > I think maybe we should checkAnalysis for command's query plan? Or is there > any consideration for not checking analysis for command? > Seems this issue still exists in master branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org