[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431653465 ## File path: python/pyspark/sql/utils.py ## @@ -75,21 +96,29 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() -stackTrace = '\n\t at '.join(map(lambda x: x.toString(), e.getStackTrace())) c = e.getCause() + +jvm = SparkContext._jvm +jwriter = jvm.java.io.StringWriter() +e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) +stacktrace = jwriter.toString() Review comment: Seems like `getStackTrace` doesn't show the cause whereas `printStackTrace` shows it too. It's best to make it same as shown in the JVM anyway :-). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431637708 ## File path: python/pyspark/sql/utils.py ## @@ -75,21 +96,29 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() -stackTrace = '\n\t at '.join(map(lambda x: x.toString(), e.getStackTrace())) c = e.getCause() + +jvm = SparkContext._jvm +jwriter = jvm.java.io.StringWriter() +e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) +stacktrace = jwriter.toString() if s.startswith('org.apache.spark.sql.AnalysisException: '): -return AnalysisException(s.split(': ', 1)[1], stackTrace, c) +return AnalysisException(s.split(': ', 1)[1], stacktrace, c) if s.startswith('org.apache.spark.sql.catalyst.analysis'): -return AnalysisException(s.split(': ', 1)[1], stackTrace, c) +return AnalysisException(s.split(': ', 1)[1], stacktrace, c) if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '): -return ParseException(s.split(': ', 1)[1], stackTrace, c) +return ParseException(s.split(': ', 1)[1], stacktrace, c) Review comment: I think `ParseException` at least shows a meaningful error message to the end user such as: ``` : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ ``` If developers want to debug, they can enable `spark.sql.pyspark.jvmStacktrace.enabled`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r43163 ## File path: python/pyspark/sql/utils.py ## @@ -75,21 +96,29 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() -stackTrace = '\n\t at '.join(map(lambda x: x.toString(), e.getStackTrace())) c = e.getCause() + +jvm = SparkContext._jvm +jwriter = jvm.java.io.StringWriter() +e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) +stacktrace = jwriter.toString() Review comment: Seems different. This is what I get from `getStackTrace`: ``` org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2117) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2066) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2065) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2065) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1021) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1021) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1021) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2297) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2246) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2235) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:823) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2108) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2129) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2148) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3653) at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2695) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) at org.apache.spark.sql.Dataset.head(Dataset.scala:2695) at org.apache.spark.sql.Dataset.take(Dataset.scala:2902) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300) at org.apache.spark.sql.Dataset.showString(Dataset.scala:337) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) ``` this is what I get from `printStackTrace` ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 2.0 failed 4 times, most recent failure: Lost task 10.3 in stage 2.0 (TID 18, 192.168.35.193, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 605, in main process() File
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431614778 ## File path: python/pyspark/sql/utils.py ## @@ -18,8 +18,19 @@ import py4j import sys +from pyspark import SparkContext + if sys.version_info.major >= 3: unicode = str +# Disable exception chaining (PEP 3134) in captured exceptions +# in order to hide JVM stacktace. +exec(""" +def raise_from(e): +raise e from None +""") +else: +def raise_from(e): +raise e Review comment: re: https://github.com/apache/spark/pull/28661#discussion_r431606605 too. Yeah. In Python 2, there is no chaining. This is kind of a new feature in Python 3. e.g.) in the current master: Python 2: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ ``` Python 3: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql. : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT',
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431614778 ## File path: python/pyspark/sql/utils.py ## @@ -18,8 +18,19 @@ import py4j import sys +from pyspark import SparkContext + if sys.version_info.major >= 3: unicode = str +# Disable exception chaining (PEP 3134) in captured exceptions +# in order to hide JVM stacktace. +exec(""" +def raise_from(e): +raise e from None +""") +else: +def raise_from(e): +raise e Review comment: re: https://github.com/apache/spark/pull/28661#discussion_r431606605 too. Yeah. In Python 2, there is already on chaining. This is kind of a new feature in Python 3. e.g.) in the current master: Python 2: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ ``` Python 3: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql. : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN',
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431614778 ## File path: python/pyspark/sql/utils.py ## @@ -18,8 +18,19 @@ import py4j import sys +from pyspark import SparkContext + if sys.version_info.major >= 3: unicode = str +# Disable exception chaining (PEP 3134) in captured exceptions +# in order to hide JVM stacktace. +exec(""" +def raise_from(e): +raise e from None +""") +else: +def raise_from(e): +raise e Review comment: Yeah. In Python 2, there is already on chaining. This is kind of a new feature in Python 3. e.g.) in the current master: Python 2: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ ``` Python 3: ```python >>> sql("a") ``` ``` Traceback (most recent call last): File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco return f(*a, **kw) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql. : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == a ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 646, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco raise converted pyspark.sql.utils.ParseException: mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP',
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431611653 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1784,6 +1784,15 @@ object SQLConf { .version("3.0.0") .fallbackConf(ARROW_EXECUTION_ENABLED) + val PYSPARK_JVM_STACKTRACE_ENABLED = +buildConf("spark.sql.pyspark.jvmStacktrace.enabled") + .doc("When true, it shows the JVM stacktrace in the user-facing PySpark exception " + +"together with Python stacktrace. By default, it is disabled and hides JVM stacktrace " + +"and shows a Python-friendly exception only.") + .version("3.0.0") Review comment: Can be arguable .. but it virtually changes the exception message only at the core. I personally think it's okay/good to have it in 3.0. But I am okay to retarget if there's any concern about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431610928 ## File path: python/pyspark/sql/utils.py ## @@ -75,21 +96,29 @@ class UnknownException(CapturedException): def convert_exception(e): s = e.toString() -stackTrace = '\n\t at '.join(map(lambda x: x.toString(), e.getStackTrace())) c = e.getCause() + +jvm = SparkContext._jvm +jwriter = jvm.java.io.StringWriter() +e.printStackTrace(jvm.java.io.PrintWriter(jwriter)) +stacktrace = jwriter.toString() Review comment: Previous stacktrace wasn't actually quite correct. It hid the stacktrace from executor side before. Now, this PR handles an exception from executor so I needed to change this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic
HyukjinKwon commented on a change in pull request #28661: URL: https://github.com/apache/spark/pull/28661#discussion_r431599015 ## File path: python/pyspark/sql/utils.py ## @@ -18,8 +18,19 @@ import py4j import sys +from pyspark import SparkContext + if sys.version_info.major >= 3: unicode = str +# Disable exception chaining (PEP 3134) in captured exceptions +# in order to hide JVM stacktace. +exec(""" +def raise_from(e): +raise e from None +""") Review comment: This way, actually I mimicked [`six`](https://github.com/benjaminp/six/blob/c0be8815d13df45b6ae471c4c436cce8c192245d/six.py#L729-L738) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org