Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-09 Thread via GitHub


HyukjinKwon closed pull request #46451: [SPARK-48180][SQL] Improve error when 
UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY 
exprs
URL: https://github.com/apache/spark/pull/46451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-09 Thread via GitHub


HyukjinKwon commented on PR #46451:
URL: https://github.com/apache/spark/pull/46451#issuecomment-2103611091

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-08 Thread via GitHub


allisonwang-db commented on code in PR #46451:
URL: https://github.com/apache/spark/pull/46451#discussion_r1594753147


##
sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala:
##
@@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with 
SharedSparkSession {
 Row("abc"))
 }
   }
+
+  test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table 
argument") {
+assume(shouldTestPythonUDFs)
+spark.udtf.registerPython("testUDTF", pythonUDTF)
+checkError(
+  exception = intercept[ParseException](sql(
+"""
+  |SELECT * FROM testUDTF(
+  |  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
+  |  WITH SINGLE PARTITION
+  |  ORDER BY device_id, data_ds)

Review Comment:
   SGTM!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-08 Thread via GitHub


dtenedor commented on code in PR #46451:
URL: https://github.com/apache/spark/pull/46451#discussion_r1594743832


##
sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala:
##
@@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with 
SharedSparkSession {
 Row("abc"))
 }
   }
+
+  test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table 
argument") {
+assume(shouldTestPythonUDFs)
+spark.udtf.registerPython("testUDTF", pythonUDTF)
+checkError(
+  exception = intercept[ParseException](sql(
+"""
+  |SELECT * FROM testUDTF(
+  |  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
+  |  WITH SINGLE PARTITION
+  |  ORDER BY device_id, data_ds)

Review Comment:
   Yes, that should be the correct syntax.
   Interesting idea...trying this out with some simple and complex cases, I am 
a bit scared to copy/paste the entire provided ORDER BY clause into the error 
message since it could be very long if there are many columns/complex 
expressions. But the new error message specifically indicates to add 
parentheses around the expressions, it should be pretty clear (see L381-L384 
below).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-08 Thread via GitHub


allisonwang-db commented on code in PR #46451:
URL: https://github.com/apache/spark/pull/46451#discussion_r1594719424


##
sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonUDTFSuite.scala:
##
@@ -363,4 +364,29 @@ class PythonUDTFSuite extends QueryTest with 
SharedSparkSession {
 Row("abc"))
 }
   }
+
+  test("SPARK-48180: Analyzer bug with multiple ORDER BY items for input table 
argument") {
+assume(shouldTestPythonUDFs)
+spark.udtf.registerPython("testUDTF", pythonUDTF)
+checkError(
+  exception = intercept[ParseException](sql(
+"""
+  |SELECT * FROM testUDTF(
+  |  TABLE(SELECT 1 AS device_id, 2 AS data_ds)
+  |  WITH SINGLE PARTITION
+  |  ORDER BY device_id, data_ds)

Review Comment:
   Should the correct syntax be `ORDER BY (device_id, date_ds)`? Do we want to 
add an example in the error message?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-08 Thread via GitHub


dtenedor commented on PR #46451:
URL: https://github.com/apache/spark/pull/46451#issuecomment-2101043782

   @HyukjinKwon I fixed the test failures, it should work now :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-07 Thread via GitHub


HyukjinKwon commented on code in PR #46451:
URL: https://github.com/apache/spark/pull/46451#discussion_r1593312100


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1648,6 +1648,20 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
   message = "ORDER BY cannot be specified unless either " +
 "PARTITION BY or WITH SINGLE PARTITION is also present",
   ctx = ctx.tableArgumentPartitioning)
+def invalidPartitionOrOrderingExpression(clause: String): String = {
+  s"The table function call includes a table argument with an invalid 
partitioning/ordering " +
+s"specification: the $clause clause included multiple expressions 
without parentheses " +
+s"surrounding them; please add parentheses around these expressions 
and then retry the " +
+s"query again"

Review Comment:
   ```suggestion
 "The table function call includes a table argument with an invalid 
partitioning/ordering " +
   s"specification: the $clause clause included multiple expressions 
without parentheses " +
   "surrounding them; please add parentheses around these expressions 
and then retry the " +
   "query again"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-07 Thread via GitHub


HyukjinKwon commented on code in PR #46451:
URL: https://github.com/apache/spark/pull/46451#discussion_r1593312100


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1648,6 +1648,20 @@ class AstBuilder extends DataTypeAstBuilder with 
SQLConfHelper with Logging {
   message = "ORDER BY cannot be specified unless either " +
 "PARTITION BY or WITH SINGLE PARTITION is also present",
   ctx = ctx.tableArgumentPartitioning)
+def invalidPartitionOrOrderingExpression(clause: String): String = {
+  s"The table function call includes a table argument with an invalid 
partitioning/ordering " +
+s"specification: the $clause clause included multiple expressions 
without parentheses " +
+s"surrounding them; please add parentheses around these expressions 
and then retry the " +
+s"query again"

Review Comment:
   ```suggestion
 "The table function call includes a table argument with an invalid 
partitioning/ordering " +
   s"specification: the $clause clause included multiple expressions 
without parentheses " +
   "surrounding them; please add parentheses around these expressions 
and then retry the " +
   "query again"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48180][SQL] Improve error when UDTF call with TABLE arg forgets parentheses around multiple PARTITION/ORDER BY exprs [spark]

2024-05-07 Thread via GitHub


dtenedor commented on PR #46451:
URL: https://github.com/apache/spark/pull/46451#issuecomment-2099469184

   cc @ueshin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org