Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2138051309 fixing at https://github.com/apache/spark/pull/46794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
LuciferYang commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2137364776 there are new failing cases in `SQLQueryTestSuite` and `ThriftServerQueryTestSuite ` after this one is merged into branch-3.5: - ThriftServerQueryTestSuite : https://github.com/apache/spark/actions/runs/9273419348/job/25513506718 ``` [info] - identifier-clause.sql *** FAILED *** (2 seconds, 408 milliseconds) [info] "" did not contain "Exception" Exception did not match for query #96 [info] create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1), expected: , but got: java.sql.SQLException [info] org.apache.hive.service.cli.HiveSQLException: Error running query: [NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] CREATE Hive TABLE (AS SELECT) is not supported, if you want to enable it, please set "spark.sql.catalogImplementation" to "hive".; [info] 'CreateTable `spark_catalog`.`default`.`t2`, ErrorIfExists [info] +- Aggregate [my_col#20215], [my_col#20215] [info] +- SubqueryAlias __auto_generated_subquery_name [info] +- SubqueryAlias as [info]+- LocalRelation [my_col#20215] [info] [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:262) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:166) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79) [info] at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:41) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:166) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:161) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at javax.security.auth.Subject.doAs(Subject.java:422) [info] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) [info] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:175) [info] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:750) [info] Caused by: org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] CREATE Hive TABLE (AS SELECT) is not supported, if you want to enable it, please set "spark.sql.catalogImplementation" to "hive".; [info] 'CreateTable `spark_catalog`.`default`.`t2`, ErrorIfExists [info] +- Aggregate [my_col#20215], [my_col#20215] [info] +- SubqueryAlias __auto_generated_subquery_name [info] +- SubqueryAlias as [info]+- LocalRelation [my_col#20215] ``` - SQLQueryTestSuite : https://github.com/apache/spark/actions/runs/9273419348/job/25513508812 ``` [info] - identifier-clause.sql *** FAILED *** (1 second, 208 milliseconds) [info] identifier-clause.sql [info] Expected "[]", but got "[org.apache.spark.sql.catalyst.ExtendedAnalysisException [info] { [info] "errorClass" : "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT", [info] "messageParameters" : { [info] "cmd" : "CREATE Hive TABLE (AS SELECT)" [info] } [info] }]" Result did not match for query #96 [info] create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1) (SQLQueryTestSuite.scala:876) ``` cc @nikolamand-db @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan closed pull request #46580: [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier URL: https://github.com/apache/spark/pull/46580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2135723638 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
nikolamand-db commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1616397169 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -20,19 +20,23 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, Expression} import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan -import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER import org.apache.spark.sql.types.StringType /** * Resolves the identifier expressions and builds the original plans/expressions. */ -object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with EvalHelper { +class ResolveIdentifierClause(earlyBatches: Seq[RuleExecutor[LogicalPlan]#Batch]) + extends Rule[LogicalPlan] with AliasHelper with EvalHelper { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( _.containsAnyPattern(UNRESOLVED_IDENTIFIER)) { case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved => - p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr)) + val executor = new RuleExecutor[LogicalPlan] { Review Comment: Added, please check again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1613945984 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -20,19 +20,23 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, Expression} import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan -import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER import org.apache.spark.sql.types.StringType /** * Resolves the identifier expressions and builds the original plans/expressions. */ -object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with EvalHelper { +class ResolveIdentifierClause(earlyBatches: Seq[RuleExecutor[LogicalPlan]#Batch]) + extends Rule[LogicalPlan] with AliasHelper with EvalHelper { override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( _.containsAnyPattern(UNRESOLVED_IDENTIFIER)) { case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved => - p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr)) + val executor = new RuleExecutor[LogicalPlan] { Review Comment: We only need to create one `RuleExecutor` instance once in this rule. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
nikolamand-db commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2129615940 > I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as well. Can we add a test to verify? Checked locally, seems like these changes don't resolve the issue: ``` spark-sql (default)> explain extended WITH S(c1, c2) AS (VALUES(1, 2), (2, 3)), T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd')) SELECT IDENTIFIER('max')(IDENTIFIER('c1')) FROM IDENTIFIER('T'); == Parsed Logical Plan == CTE [S, T] : :- 'SubqueryAlias S : : +- 'UnresolvedSubqueryColumnAliases [c1, c2] : : +- 'UnresolvedInlineTable [col1, col2], [[1, 2], [2, 3]] : +- 'SubqueryAlias T : +- 'UnresolvedSubqueryColumnAliases [c1, c2] :+- 'UnresolvedInlineTable [col1, col2], [[a, b], [c, d]] +- 'Project [unresolvedalias(expressionwithunresolvedidentifier(max, expressionwithunresolvedidentifier(c1, ), org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1453/0x000401c95138@312ddc51))] +- 'PlanWithUnresolvedIdentifier T, org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1420/0x000401c5e688@330f3046 == Analyzed Logical Plan == org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `c1` cannot be resolved. Did you mean one of the following? [`Z`, `s`]. SQLSTATE: 42703; line 1 pos 129; 'WithCTE :- CTERelationDef 0, false : +- SubqueryAlias S : +- Project [col1#5 AS c1#9, col2#6 AS c2#10] :+- LocalRelation [col1#5, col2#6] :- CTERelationDef 1, false : +- SubqueryAlias T : +- Project [col1#7 AS c1#11, col2#8 AS c2#12] :+- LocalRelation [col1#7, col2#8] +- 'Project [unresolvedalias('max('c1))] +- SubqueryAlias spark_catalog.default.t +- Relation spark_catalog.default.t[Z#13,s#14] parquet ``` `PlanWithUnresolvedIdentifier` gets resolved but it tries to use table `t` from catalog instead of CTE definition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2124575129 I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as well. Can we add a test to verify? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1609790546 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -20,19 +20,20 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, Expression} import org.apache.spark.sql.catalyst.parser.CatalystSqlParser import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan -import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER import org.apache.spark.sql.types.StringType /** * Resolves the identifier expressions and builds the original plans/expressions. */ -object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with EvalHelper { +class ResolveIdentifierClause(executor: RuleExecutor[LogicalPlan]) extends Rule[LogicalPlan] Review Comment: ditto, let's pass in the rules and create `RuleExecutor` within this rule, as the passed rules may be new instances for each analysis execution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]
cloud-fan commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1609786723 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -254,7 +254,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor TypeCoercion.typeCoercionRules } - override def batches: Seq[Batch] = Seq( + private val earlyBatches: Seq[Batch] = Seq( Review Comment: let's keep it as `def`. It may not be a big deal but I'd like to keep the previous behavior as much as we can: we create a new instance of the rule for each analysis execution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org