[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431653465



##
File path: python/pyspark/sql/utils.py
##
@@ -75,21 +96,29 @@ class UnknownException(CapturedException):
 
 def convert_exception(e):
 s = e.toString()
-stackTrace = '\n\t at '.join(map(lambda x: x.toString(), 
e.getStackTrace()))
 c = e.getCause()
+
+jvm = SparkContext._jvm
+jwriter = jvm.java.io.StringWriter()
+e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
+stacktrace = jwriter.toString()

Review comment:
   Seems like `getStackTrace` doesn't show the cause whereas 
`printStackTrace` shows it too. It's best to make it same as shown in the JVM 
anyway :-).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431637708



##
File path: python/pyspark/sql/utils.py
##
@@ -75,21 +96,29 @@ class UnknownException(CapturedException):
 
 def convert_exception(e):
 s = e.toString()
-stackTrace = '\n\t at '.join(map(lambda x: x.toString(), 
e.getStackTrace()))
 c = e.getCause()
+
+jvm = SparkContext._jvm
+jwriter = jvm.java.io.StringWriter()
+e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
+stacktrace = jwriter.toString()
 if s.startswith('org.apache.spark.sql.AnalysisException: '):
-return AnalysisException(s.split(': ', 1)[1], stackTrace, c)
+return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
 if s.startswith('org.apache.spark.sql.catalyst.analysis'):
-return AnalysisException(s.split(': ', 1)[1], stackTrace, c)
+return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
 if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
-return ParseException(s.split(': ', 1)[1], stackTrace, c)
+return ParseException(s.split(': ', 1)[1], stacktrace, c)

Review comment:
   I think `ParseException` at least shows a meaningful error message to 
the end user such as:
   
   ```
   : org.apache.spark.sql.catalyst.parser.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   ```
   
   If developers want to debug, they can enable 
`spark.sql.pyspark.jvmStacktrace.enabled`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r43163



##
File path: python/pyspark/sql/utils.py
##
@@ -75,21 +96,29 @@ class UnknownException(CapturedException):
 
 def convert_exception(e):
 s = e.toString()
-stackTrace = '\n\t at '.join(map(lambda x: x.toString(), 
e.getStackTrace()))
 c = e.getCause()
+
+jvm = SparkContext._jvm
+jwriter = jvm.java.io.StringWriter()
+e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
+stacktrace = jwriter.toString()

Review comment:
   Seems different. This is what I get from `getStackTrace`:
   
   ```
   
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2117)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2066)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2065)
 at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
 at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
 at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2065)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1021)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1021)
 at scala.Option.foreach(Option.scala:407)
 at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1021)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2297)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2246)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2235)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
 at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:823)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2108)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2129)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2148)
 at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467)
 at 
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420)
 at 
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
 at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3653)
 at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2695)
 at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642)
 at org.apache.spark.sql.Dataset.head(Dataset.scala:2695)
 at org.apache.spark.sql.Dataset.take(Dataset.scala:2902)
 at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
 at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at py4j.Gateway.invoke(Gateway.java:282)
 at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:238)
 at java.lang.Thread.run(Thread.java:748)
   ```
   
   this is what I get from `printStackTrace`
   
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 
in stage 2.0 failed 4 times, most recent failure: Lost task 10.3 in stage 2.0 
(TID 18, 192.168.35.193, executor 2): 
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
 File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 605, in 
main
   process()
 File 

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431614778



##
File path: python/pyspark/sql/utils.py
##
@@ -18,8 +18,19 @@
 import py4j
 import sys
 
+from pyspark import SparkContext
+
 if sys.version_info.major >= 3:
 unicode = str
+# Disable exception chaining (PEP 3134) in captured exceptions
+# in order to hide JVM stacktace.
+exec("""
+def raise_from(e):
+raise e from None
+""")
+else:
+def raise_from(e):
+raise e
 

Review comment:
   re: https://github.com/apache/spark/pull/28661#discussion_r431606605 too.
   
   Yeah. In Python 2, there is no chaining. This is kind of a new feature in 
Python 3. 
   
   e.g.) in the current master:
   Python 2:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   ```
   
   Python 3:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco
   return f(*a, **kw)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql.
   : org.apache.spark.sql.catalyst.parser.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   
at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431614778



##
File path: python/pyspark/sql/utils.py
##
@@ -18,8 +18,19 @@
 import py4j
 import sys
 
+from pyspark import SparkContext
+
 if sys.version_info.major >= 3:
 unicode = str
+# Disable exception chaining (PEP 3134) in captured exceptions
+# in order to hide JVM stacktace.
+exec("""
+def raise_from(e):
+raise e from None
+""")
+else:
+def raise_from(e):
+raise e
 

Review comment:
   re: https://github.com/apache/spark/pull/28661#discussion_r431606605 too.
   
   Yeah. In Python 2, there is already on chaining. This is kind of a new 
feature in Python 3. 
   
   e.g.) in the current master:
   Python 2:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   ```
   
   Python 3:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco
   return f(*a, **kw)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql.
   : org.apache.spark.sql.catalyst.parser.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   
at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431614778



##
File path: python/pyspark/sql/utils.py
##
@@ -18,8 +18,19 @@
 import py4j
 import sys
 
+from pyspark import SparkContext
+
 if sys.version_info.major >= 3:
 unicode = str
+# Disable exception chaining (PEP 3134) in captured exceptions
+# in order to hide JVM stacktace.
+exec("""
+def raise_from(e):
+raise e from None
+""")
+else:
+def raise_from(e):
+raise e
 

Review comment:
   Yeah. In Python 2, there is already on chaining. This is kind of a new 
feature in Python 3. 
   
   e.g.) in the current master:
   Python 2:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   ```
   
   Python 3:
   
   ```python
   >>> sql("a")
   ```
   ```
   Traceback (most recent call last):
 File "/.../spark/python/pyspark/sql/utils.py", line 98, in deco
   return f(*a, **kw)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 
328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o25.sql.
   : org.apache.spark.sql.catalyst.parser.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
0)
   
   == SQL ==
   a
   ^^^
   
at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:133)
at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)
at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:604)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:604)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
   
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
 File "", line 1, in 
 File "/.../spark/python/pyspark/sql/session.py", line 646, in sql
   return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
 File "/.../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
 File "/.../spark/python/pyspark/sql/utils.py", line 102, in deco
   raise converted
   pyspark.sql.utils.ParseException:
   mismatched input 'a' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431611653



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1784,6 +1784,15 @@ object SQLConf {
   .version("3.0.0")
   .fallbackConf(ARROW_EXECUTION_ENABLED)
 
+  val PYSPARK_JVM_STACKTRACE_ENABLED =
+buildConf("spark.sql.pyspark.jvmStacktrace.enabled")
+  .doc("When true, it shows the JVM stacktrace in the user-facing PySpark 
exception " +
+"together with Python stacktrace. By default, it is disabled and hides 
JVM stacktrace " +
+"and shows a Python-friendly exception only.")
+  .version("3.0.0")

Review comment:
   Can be arguable .. but it virtually changes the exception message only 
at the core. I personally think it's okay/good to have it in 3.0. But I am okay 
to retarget if there's any concern about it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431610928



##
File path: python/pyspark/sql/utils.py
##
@@ -75,21 +96,29 @@ class UnknownException(CapturedException):
 
 def convert_exception(e):
 s = e.toString()
-stackTrace = '\n\t at '.join(map(lambda x: x.toString(), 
e.getStackTrace()))
 c = e.getCause()
+
+jvm = SparkContext._jvm
+jwriter = jvm.java.io.StringWriter()
+e.printStackTrace(jvm.java.io.PrintWriter(jwriter))
+stacktrace = jwriter.toString()

Review comment:
   Previous stacktrace wasn't actually quite correct. It hid the stacktrace 
from executor side before. Now, this PR handles an exception from executor so I 
needed to change this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28661: [SPARK-31849][PYTHON][SQL] Make PySpark exceptions more Pythonic

2020-05-28 Thread GitBox


HyukjinKwon commented on a change in pull request #28661:
URL: https://github.com/apache/spark/pull/28661#discussion_r431599015



##
File path: python/pyspark/sql/utils.py
##
@@ -18,8 +18,19 @@
 import py4j
 import sys
 
+from pyspark import SparkContext
+
 if sys.version_info.major >= 3:
 unicode = str
+# Disable exception chaining (PEP 3134) in captured exceptions
+# in order to hide JVM stacktace.
+exec("""
+def raise_from(e):
+raise e from None
+""")

Review comment:
   This way, actually I mimicked 
[`six`](https://github.com/benjaminp/six/blob/c0be8815d13df45b6ae471c4c436cce8c192245d/six.py#L729-L738)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org