[ 
https://issues.apache.org/jira/browse/SPARK-33407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33407:
------------------------------------

    Assignee: Apache Spark

> Simplify the exception message from Python UDFs
> -----------------------------------------------
>
>                 Key: SPARK-33407
>                 URL: https://issues.apache.org/jira/browse/SPARK-33407
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently, the exception message is as below:
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../python/pyspark/sql/dataframe.py", line 427, in show
>     print(self._jdf.showString(n, 20, vertical))
>   File "/.../python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, 
> in __call__
>   File "/.../python/pyspark/sql/utils.py", line 127, in deco
>     raise_from(converted)
>   File "<string>", line 3, in raise_from
> pyspark.sql.utils.PythonException:
>   An exception was thrown from Python worker in the executor:
> Traceback (most recent call last):
>   File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 605, in main
>     process()
>   File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 597, in process
>     serializer.dump_stream(out_iter, outfile)
>   File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 223, in 
> dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 141, in 
> dump_stream
>     for obj in iterator:
>   File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 212, in 
> _batched
>     for item in iterator:
>   File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 450, in mapper
>     result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in 
> udfs)
>   File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 450, in <genexpr>
>     result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in 
> udfs)
>   File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 90, in <lambda>
>     return lambda *a: f(*a)
>   File "/.../python/lib/pyspark.zip/pyspark/util.py", line 107, in wrapper
>     return f(*args, **kwargs)
>   File "<stdin>", line 3, in divide_by_zero
> ZeroDivisionError: division by zero
> {code}
> Actually, almost all cases, users only care about {{ZeroDivisionError: 
> division by zero
> }}. We don't really have to show the internal stuff in 99% cases.
> We could just make it short, for example,
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../python/pyspark/sql/dataframe.py", line 427, in show
>     print(self._jdf.showString(n, 20, vertical))
>   File "/.../python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, 
> in __call__
>   File "/.../python/pyspark/sql/utils.py", line 127, in deco
>     raise_from(converted)
>   File "<string>", line 3, in raise_from
> pyspark.sql.utils.PythonException:
>   An exception was thrown from Python worker in the executor:
> Traceback (most recent call last):
>   File "<stdin>", line 3, in divide_by_zero
> ZeroDivisionError: division by zero
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to