[spark] branch master updated: [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by default in PySpark tests

gurwls223 Mon, 07 Feb 2022 04:19:17 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new f62b36c  [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by 
default in PySpark tests
f62b36c is described below

commit f62b36c6d3964c40336959b129b284edb8097f61
Author: Hyukjin Kwon <gurwls...@apache.org>
AuthorDate: Mon Feb 7 21:18:04 2022 +0900

    [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by default in 
PySpark tests
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to show full stacktrace of Python worker and JVM in 
PySpark by controlling `spark.sql.pyspark.jvmStacktrace.enabled` and 
`spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled` only in tests.
    
    ### Why are the changes needed?
    
    [SPARK-33407](https://issues.apache.org/jira/browse/SPARK-33407) and 
[SPARK-31849](https://issues.apache.org/jira/browse/SPARK-31849) hide Java 
stacktrace and internal Python worker side traceback by default for simpler 
error messages to end users. However, specifically for unit tests, that makes a 
bit harder to debug the test failures. We should probably show the full 
stacktrace by default in tests.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, this is test only.
    
    ### How was this patch tested?
    
    Manually tested. Now the test failures show the logs as below:
    
    **Before:**
    
    ```
    =====================================================================
    ERROR [3.480s]: test (pyspark.sql.tests.test_functions.FunctionsTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      ...
    pyspark.sql.utils.PythonException:
      An exception was thrown from the Python worker. Please see the stack 
trace below.
    Traceback (most recent call last):
      File "/.../pyspark/sql/tests/test_functions.py", line 60, in <lambda>
        self.spark.range(1).select(udf(lambda x: x / 0)("id")).show()
    ZeroDivisionError: division by zero
    
    ----------------------------------------------------------------------
    Ran 1 test in 12.468s
    
    FAILED (errors=1)
    ```
    
    **After:**
    
    ```
    ======================================================================
    ERROR [3.259s]: test (pyspark.sql.tests.test_functions.FunctionsTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      ...
    pyspark.sql.utils.PythonException:
      An exception was thrown from the Python worker. Please see the stack 
trace below.
    Traceback (most recent call last):
      File "/.../pyspark/worker.py", line 678, in main
        process()
      File "/.../pyspark/worker.py", line 670, in process
        serializer.dump_stream(out_iter, outfile)
      File "/.../lib/pyspark/serializers.py", line 217, in dump_stream
        self.serializer.dump_stream(self._batched(iterator), stream)
      ...
    ZeroDivisionError: division by zero
    
    JVM stacktrace:
    ...
        at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:558)
        at 
org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:86)
        at 
org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:68)
        at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:511)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    ...
    
    Driver stacktrace:
    ...
    
    Caused by: org.apache.spark.api.python.PythonException: Traceback (most 
recent call last):
        ... 1 more
    
    ----------------------------------------------------------------------
    Ran 1 test in 12.610s
    
    FAILED (errors=1)
    ```
    
    Closes #35423 from HyukjinKwon/SPARK-38128.
    
    Authored-by: Hyukjin Kwon <gurwls...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala      | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 42979a6..59a896a 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2383,7 +2383,8 @@ object SQLConf {
         "and shows a Python-friendly exception only.")
       .version("3.0.0")
       .booleanConf
-      .createWithDefault(false)
+      // show full stacktrace in tests but hide in production by default.
+      .createWithDefault(Utils.isTesting)
 
   val ARROW_SPARKR_EXECUTION_ENABLED =
     buildConf("spark.sql.execution.arrow.sparkr.enabled")
@@ -2440,7 +2441,8 @@ object SQLConf {
         "shows the exception messages from UDFs. Note that this works only 
with CPython 3.7+.")
       .version("3.1.0")
       .booleanConf
-      .createWithDefault(true)
+      // show full stacktrace in tests but hide in production by default.
+      .createWithDefault(!Utils.isTesting)
 
   val PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_NAME =
     
buildConf("spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName")

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38128][PYTHON][TESTS] Show full stacktrace in tests by default in PySpark tests

Reply via email to