[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465523230 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, Review comment: I don't know which PR causes this regression problem since there is a big gap between 2.4 and 3.0 about CTE. I am not familiar with this part. I fixed this by debuging. In 3.0, I found only the `child of CTE` could enter into the rule of `ResolveCoalesceHints` For the above test case: ``` CTE [cte] : +- 'SubqueryAlias `cte` : +- 'UnresolvedHint REPARTITION, [3] :+- 'Project [*] : +- 'UnresolvedRelation `t` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` Only this child `Project` branch could enter into the rule of `ResolveCoalesceHints` ``` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465523230 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, Review comment: I don't know which PR causes this regression problem since there is a big gap between 2.4 and 3.0 about CTE. I am not familiar with this part. I fixed this by debuging. I found only the `child of CTE` could enter into the rule of `ResolveCoalesceHints` For the above test case: ``` CTE [cte] : +- 'SubqueryAlias `cte` : +- 'UnresolvedHint REPARTITION, [3] :+- 'Project [*] : +- 'UnresolvedRelation `t` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` Only this child `Project` branch could enter into the rule of `ResolveCoalesceHints` ``` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465523230 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, Review comment: I don't know which PR causes this regression problem since there is a big gap between 2.4 and 3.0 about CTE. I am not familiar with this part. I fixed this by debuging. In 3.0, I found only the `child of CTE` could enter into the rule of `ResolveCoalesceHints`. For the above test case: ``` CTE [cte] : +- 'SubqueryAlias `cte` : +- 'UnresolvedHint REPARTITION, [3] :+- 'Project [*] : +- 'UnresolvedRelation `t` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` Only this child `Project` branch could enter into the rule of `ResolveCoalesceHints` ``` +- 'Project [*] +- 'UnresolvedRelation `cte` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465516673 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } } + + test("SPARK-32237: Hint in CTE") { +withTable("t") { + sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id") + checkAnswer( +sql(s""" + |WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t) Review comment: Yes. The same issue reported by multiple times in Jira with different use cases, like SPARK-32237, SPARK-32347, SPARK-32535, I will file a new PR to add more unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465516673 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } } + + test("SPARK-32237: Hint in CTE") { +withTable("t") { + sql("CREATE TABLE t USING PARQUET AS SELECT 1 AS id") + checkAnswer( +sql(s""" + |WITH cte AS (SELECT /*+ REPARTITION(3) */ * FROM t) Review comment: Yes. I will file a new PR to add more unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r465516846 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3560,6 +3560,18 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } } + + test("SPARK-32237: Hint in CTE") { Review comment: Sure. I will add a new suite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r458529424 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, + CTESubstitution, + WindowsSubstitution, + EliminateUnions, + new SubstituteUnresolvedOrdinals(conf)), Review comment: I found the problem also exists in branch-2.4. See https://github.com/apache/spark/pull/29062#issuecomment-662233804 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r458529424 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, + CTESubstitution, + WindowsSubstitution, + EliminateUnions, + new SubstituteUnresolvedOrdinals(conf)), Review comment: I found the problem also exists in branch-2.4. See https://github.com/apache/spark/pull/29062#issuecomment-662233804 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r453409787 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -200,18 +200,18 @@ class Analyzer( val postHocResolutionRules: Seq[Rule[LogicalPlan]] = Nil lazy val batches: Seq[Batch] = Seq( +Batch("Substitution", fixedPoint, + CTESubstitution, + WindowsSubstitution, + EliminateUnions, + new SubstituteUnresolvedOrdinals(conf)), Review comment: It's hard to find which change causing it for me since 2.4 and 3.0 have many differences that I don't know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on a change in pull request #29062: URL: https://github.com/apache/spark/pull/29062#discussion_r452626883 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -756,6 +756,19 @@ class DataSourceV2SQLSuite } } + test("SPARK-32237: Hint in CTE") { Review comment: Oh, it’s not only for v2. I just use a similar test for modification. I will move the test to another suite This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org