[jira] [Updated] (SPARK-28195) CheckAnalysis not working for Command and report misleading error message

liupengcheng (JIRA) Thu, 27 Jun 2019 20:13:43 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-28195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liupengcheng updated SPARK-28195:
---------------------------------
    Description: 
Currently, we encountered an issue when executing 
`InsertIntoDataSourceDirCommand`, and we found that it's query relied on 
non-exist table or view, but we finally got a misleading error message:
{code:java}
Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid 
call to dataType on unresolved object, tree: 'kr.objective_id
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544)
at 
org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at 
org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276)
at org.apache.spark.sql.Dataset.&lt;init&gt;(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277)
... 11 more

{code}
After looking into the code, I found that it's because we support 
`runSQLOnFiles` feature since 2.3, and if the table does not exist and it's not 
a temporary table, then It will be treated as running directly on files.

`ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` on 
resolve failure(it's actually not a sql on files, so it will fail when 
resolving). Due to Command has empty children, `CheckAnalysis` will skip check 
the `UnresolvedRelation` and finally we got the above misleading error message 
when executing this command.

I think maybe we should checkAnalysis for command's query plan? Or is there any 
consideration for not checking analysis for command?

Seems this issue still exists in master branch. 

  was:
Currently, we encountered an issue when executing 
`InsertIntoDataSourceDirCommand`, and we found that it's query relied on 
non-exist table or view, but we finally got a misleading error message:
{code:java}
Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid 
call to dataType on unresolved object, tree: 'kr.objective_id
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544)
at 
org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at 
org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276)
at org.apache.spark.sql.Dataset.&lt;init&gt;(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277)
... 11 more

{code}
After looking into the code, I found that it's because we support 
`runSQLOnFiles` feature since 2.3, and if the table does not exist and it's not 
a temporary table, then It will be treated as running directly on files.

`ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` on 
resolve failure(it's actually not a sql on files, so it will fail when 
resolving). Due to Command has empty children, `CheckAnalysis` will skip check 
the `UnresolvedRelation` and finally we got the above misleading error message 
when executing this command.

I think maybe we should checkAnalysis for command's query plan? Or is there any 
consideration for not checking analysis for command?


> CheckAnalysis not working for Command and report misleading error message
> -------------------------------------------------------------------------
>
>                 Key: SPARK-28195
>                 URL: https://issues.apache.org/jira/browse/SPARK-28195
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: liupengcheng
>            Priority: Major
>
> Currently, we encountered an issue when executing 
> `InsertIntoDataSourceDirCommand`, and we found that it's query relied on 
> non-exist table or view, but we finally got a misleading error message:
> {code:java}
> Caused by: org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
> Invalid call to dataType on unresolved object, tree: 'kr.objective_id
> at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
> at 
> org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
> at 
> org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:440)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:440)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:159)
> at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:159)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:544)
> at 
> org.apache.spark.sql.execution.command.InsertIntoDataSourceDirCommand.run(InsertIntoDataSourceDirCommand.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at 
> org.apache.spark.sql.execution.adaptive.QueryStage.executeCollect(QueryStage.scala:246)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3277)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3276)
> at org.apache.spark.sql.Dataset.&lt;init&gt;(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:277)
> ... 11 more
> {code}
> After looking into the code, I found that it's because we support 
> `runSQLOnFiles` feature since 2.3, and if the table does not exist and it's 
> not a temporary table, then It will be treated as running directly on files.
> `ResolveSQLOnFile` rule will analyze it, and return an `UnresolvedRelation` 
> on resolve failure(it's actually not a sql on files, so it will fail when 
> resolving). Due to Command has empty children, `CheckAnalysis` will skip 
> check the `UnresolvedRelation` and finally we got the above misleading error 
> message when executing this command.
> I think maybe we should checkAnalysis for command's query plan? Or is there 
> any consideration for not checking analysis for command?
> Seems this issue still exists in master branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28195) CheckAnalysis not working for Command and report misleading error message

Reply via email to