Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon closed pull request #46451: [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs URL: https://github.com/apache/spark/pull/46451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon commented on PR #46451: URL: https://github.com/apache/spark/pull/46451#issuecomment-2103611091 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
allisonwang-db commented on code in PR #46451: URL: https://github.com/apache/spark/pull/46451#discussion_r1594753147 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala: ## @@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with SharedSparkSession { Row("abc")) } } + + test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table argument") { +assume(shouldTestPythonUDFs) +spark.udtf.registerPython("testUDTF", pythonUDTF) +checkError( + exception = intercept[ParseException](sql( +""" + |SELECT * FROM testUDTF( + | TABLE(SELECT 1 AS device_id, 2 AS data_ds) + | WITH SINGLE PARTITION + | ORDER BY device_id, data_ds) Review Comment: SGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
dtenedor commented on code in PR #46451: URL: https://github.com/apache/spark/pull/46451#discussion_r1594743832 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala: ## @@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with SharedSparkSession { Row("abc")) } } + + test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table argument") { +assume(shouldTestPythonUDFs) +spark.udtf.registerPython("testUDTF", pythonUDTF) +checkError( + exception = intercept[ParseException](sql( +""" + |SELECT * FROM testUDTF( + | TABLE(SELECT 1 AS device_id, 2 AS data_ds) + | WITH SINGLE PARTITION + | ORDER BY device_id, data_ds) Review Comment: Yes, that should be the correct syntax. Interesting idea...trying this out with some simple and complex cases, I am a bit scared to copy/paste the entire provided ORDER BY clause into the error message since it could be very long if there are many columns/complex expressions. But the new error message specifically indicates to add parentheses around the expressions, it should be pretty clear (see L381-L384 below). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
allisonwang-db commented on code in PR #46451: URL: https://github.com/apache/spark/pull/46451#discussion_r1594719424 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala: ## @@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with SharedSparkSession { Row("abc")) } } + + test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table argument") { +assume(shouldTestPythonUDFs) +spark.udtf.registerPython("testUDTF", pythonUDTF) +checkError( + exception = intercept[ParseException](sql( +""" + |SELECT * FROM testUDTF( + | TABLE(SELECT 1 AS device_id, 2 AS data_ds) + | WITH SINGLE PARTITION + | ORDER BY device_id, data_ds) Review Comment: Should the correct syntax be `ORDER BY (device_id, date_ds)`? Do we want to add an example in the error message? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
dtenedor commented on PR #46451: URL: https://github.com/apache/spark/pull/46451#issuecomment-2101043782 @HyukjinKwon I fixed the test failures, it should work now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon commented on code in PR #46451: URL: https://github.com/apache/spark/pull/46451#discussion_r1593312100 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1648,6 +1648,20 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { message = "ORDER BY cannot be specified unless either " + "PARTITION BY or WITH SINGLE PARTITION is also present", ctx = ctx.tableArgumentPartitioning) +def invalidPartitionOrOrderingExpression(clause: String): String = { + s"The table function call includes a table argument with an invalid partitioning/ordering " + +s"specification: the $clause clause included multiple expressions without parentheses " + +s"surrounding them; please add parentheses around these expressions and then retry the " + +s"query again" Review Comment: ```suggestion "The table function call includes a table argument with an invalid partitioning/ordering " + s"specification: the $clause clause included multiple expressions without parentheses " + "surrounding them; please add parentheses around these expressions and then retry the " + "query again" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
HyukjinKwon commented on code in PR #46451: URL: https://github.com/apache/spark/pull/46451#discussion_r1593312100 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -1648,6 +1648,20 @@ class AstBuilder extends DataTypeAstBuilder with SQLConfHelper with Logging { message = "ORDER BY cannot be specified unless either " + "PARTITION BY or WITH SINGLE PARTITION is also present", ctx = ctx.tableArgumentPartitioning) +def invalidPartitionOrOrderingExpression(clause: String): String = { + s"The table function call includes a table argument with an invalid partitioning/ordering " + +s"specification: the $clause clause included multiple expressions without parentheses " + +s"surrounding them; please add parentheses around these expressions and then retry the " + +s"query again" Review Comment: ```suggestion "The table function call includes a table argument with an invalid partitioning/ordering " + s"specification: the $clause clause included multiple expressions without parentheses " + "surrounding them; please add parentheses around these expressions and then retry the " + "query again" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]
dtenedor commented on PR #46451: URL: https://github.com/apache/spark/pull/46451#issuecomment-2099469184 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org