Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-29 Thread via GitHub


cloud-fan commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2138051309

   fixing at https://github.com/apache/spark/pull/46794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-29 Thread via GitHub


LuciferYang commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2137364776

   there are new failing cases in `SQLQueryTestSuite` and 
`ThriftServerQueryTestSuite ` after this one is merged into branch-3.5:
   
   - ThriftServerQueryTestSuite : 
https://github.com/apache/spark/actions/runs/9273419348/job/25513506718
   
   ```
   [info] - identifier-clause.sql *** FAILED *** (2 seconds, 408 milliseconds)
   [info]   "" did not contain "Exception" Exception did not match for query #96
   [info]   create table identifier('t2') as (select my_col from (values (1), 
(2), (1) as (my_col)) group by 1), expected: , but got: java.sql.SQLException
   [info]   org.apache.hive.service.cli.HiveSQLException: Error running query: 
[NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] 
org.apache.spark.sql.AnalysisException: 
[NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] CREATE Hive TABLE (AS SELECT) is 
not supported, if you want to enable it, please set 
"spark.sql.catalogImplementation" to "hive".;
   [info]   'CreateTable `spark_catalog`.`default`.`t2`, ErrorIfExists
   [info]   +- Aggregate [my_col#20215], [my_col#20215]
   [info]  +- SubqueryAlias __auto_generated_subquery_name
   [info] +- SubqueryAlias as
   [info]+- LocalRelation [my_col#20215]
   [info]   
   [info]   at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:262)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:166)
   [info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:41)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:166)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:161)
   [info]   at java.security.AccessController.doPrivileged(Native Method)
   [info]   at javax.security.auth.Subject.doAs(Subject.java:422)
   [info]   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
   [info]   at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:175)
   [info]   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [info]   at java.lang.Thread.run(Thread.java:750)
   [info]   Caused by: org.apache.spark.sql.AnalysisException: 
[NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] CREATE Hive TABLE (AS SELECT) is 
not supported, if you want to enable it, please set 
"spark.sql.catalogImplementation" to "hive".;
   [info]   'CreateTable `spark_catalog`.`default`.`t2`, ErrorIfExists
   [info]   +- Aggregate [my_col#20215], [my_col#20215]
   [info]  +- SubqueryAlias __auto_generated_subquery_name
   [info] +- SubqueryAlias as
   [info]+- LocalRelation [my_col#20215]
   ```
   
   - SQLQueryTestSuite : 
https://github.com/apache/spark/actions/runs/9273419348/job/25513508812
   
   ```
   [info] - identifier-clause.sql *** FAILED *** (1 second, 208 milliseconds)
   [info]   identifier-clause.sql
   [info]   Expected "[]", but got 
"[org.apache.spark.sql.catalyst.ExtendedAnalysisException
   [info]   {
   [info] "errorClass" : "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT",
   [info] "messageParameters" : {
   [info]   "cmd" : "CREATE Hive TABLE (AS SELECT)"
   [info] }
   [info]   }]" Result did not match for query #96
   [info]   create table identifier('t2') as (select my_col from (values (1), 
(2), (1) as (my_col)) group by 1) (SQLQueryTestSuite.scala:876)
   ```
   
   cc @nikolamand-db @cloud-fan 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please 

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-28 Thread via GitHub


cloud-fan closed pull request #46580: [SPARK-48273][SQL] Fix late rewrite of 
PlanWithUnresolvedIdentifier
URL: https://github.com/apache/spark/pull/46580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-28 Thread via GitHub


cloud-fan commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2135723638

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-27 Thread via GitHub


nikolamand-db commented on code in PR #46580:
URL: https://github.com/apache/spark/pull/46580#discussion_r1616397169


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##
@@ -20,19 +20,23 @@ package org.apache.spark.sql.catalyst.analysis
 import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, 
Expression}
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
 import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER
 import org.apache.spark.sql.types.StringType
 
 /**
  * Resolves the identifier expressions and builds the original 
plans/expressions.
  */
-object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with 
EvalHelper {
+class ResolveIdentifierClause(earlyBatches: 
Seq[RuleExecutor[LogicalPlan]#Batch])
+  extends Rule[LogicalPlan] with AliasHelper with EvalHelper {
 
   override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
 _.containsAnyPattern(UNRESOLVED_IDENTIFIER)) {
 case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved =>
-  p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr))
+  val executor = new RuleExecutor[LogicalPlan] {

Review Comment:
   Added, please check again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub


cloud-fan commented on code in PR #46580:
URL: https://github.com/apache/spark/pull/46580#discussion_r1613945984


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##
@@ -20,19 +20,23 @@ package org.apache.spark.sql.catalyst.analysis
 import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, 
Expression}
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
 import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER
 import org.apache.spark.sql.types.StringType
 
 /**
  * Resolves the identifier expressions and builds the original 
plans/expressions.
  */
-object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with 
EvalHelper {
+class ResolveIdentifierClause(earlyBatches: 
Seq[RuleExecutor[LogicalPlan]#Batch])
+  extends Rule[LogicalPlan] with AliasHelper with EvalHelper {
 
   override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
 _.containsAnyPattern(UNRESOLVED_IDENTIFIER)) {
 case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved =>
-  p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr))
+  val executor = new RuleExecutor[LogicalPlan] {

Review Comment:
   We only need to create one `RuleExecutor` instance once in this rule.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub


nikolamand-db commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2129615940

   > I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as 
well. Can we add a test to verify?
   
   Checked locally, seems like these changes don't resolve the issue:
   ```
   spark-sql (default)> explain extended WITH S(c1, c2) AS (VALUES(1, 2), (2, 
3)), T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd')) SELECT 
IDENTIFIER('max')(IDENTIFIER('c1')) FROM IDENTIFIER('T');
   
   == Parsed Logical Plan ==
   CTE [S, T]
   :  :- 'SubqueryAlias S
   :  :  +- 'UnresolvedSubqueryColumnAliases [c1, c2]
   :  : +- 'UnresolvedInlineTable [col1, col2], [[1, 2], [2, 3]]
   :  +- 'SubqueryAlias T
   : +- 'UnresolvedSubqueryColumnAliases [c1, c2]
   :+- 'UnresolvedInlineTable [col1, col2], [[a, b], [c, d]]
   +- 'Project [unresolvedalias(expressionwithunresolvedidentifier(max, 
expressionwithunresolvedidentifier(c1, ), 
org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1453/0x000401c95138@312ddc51))]
  +- 'PlanWithUnresolvedIdentifier T, 
org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1420/0x000401c5e688@330f3046
   
   == Analyzed Logical Plan ==
   org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] 
A column, variable, or function parameter with name `c1` cannot be resolved. 
Did you mean one of the following? [`Z`, `s`]. SQLSTATE: 42703; line 1 pos 129;
   'WithCTE
   :- CTERelationDef 0, false
   :  +- SubqueryAlias S
   : +- Project [col1#5 AS c1#9, col2#6 AS c2#10]
   :+- LocalRelation [col1#5, col2#6]
   :- CTERelationDef 1, false
   :  +- SubqueryAlias T
   : +- Project [col1#7 AS c1#11, col2#8 AS c2#12]
   :+- LocalRelation [col1#7, col2#8]
   +- 'Project [unresolvedalias('max('c1))]
  +- SubqueryAlias spark_catalog.default.t
 +- Relation spark_catalog.default.t[Z#13,s#14] parquet
   ```
   `PlanWithUnresolvedIdentifier` gets resolved but it tries to use table `t` 
from catalog instead of CTE definition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-22 Thread via GitHub


cloud-fan commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2124575129

   I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as 
well. Can we add a test to verify?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-22 Thread via GitHub


cloud-fan commented on code in PR #46580:
URL: https://github.com/apache/spark/pull/46580#discussion_r1609790546


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##
@@ -20,19 +20,20 @@ package org.apache.spark.sql.catalyst.analysis
 import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, 
Expression}
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
 import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER
 import org.apache.spark.sql.types.StringType
 
 /**
  * Resolves the identifier expressions and builds the original 
plans/expressions.
  */
-object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with 
EvalHelper {
+class ResolveIdentifierClause(executor: RuleExecutor[LogicalPlan]) extends 
Rule[LogicalPlan]

Review Comment:
   ditto, let's pass in the rules and create `RuleExecutor` within this rule, 
as the passed rules may be new instances for each analysis execution



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-22 Thread via GitHub


cloud-fan commented on code in PR #46580:
URL: https://github.com/apache/spark/pull/46580#discussion_r1609786723


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -254,7 +254,7 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
 TypeCoercion.typeCoercionRules
   }
 
-  override def batches: Seq[Batch] = Seq(
+  private val earlyBatches: Seq[Batch] = Seq(

Review Comment:
   let's keep it as `def`. It may not be a big deal but I'd like to keep the 
previous behavior as much as we can: we create a new instance of the rule for 
each analysis execution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org