[ https://issues.apache.org/jira/browse/SPARK-33407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-33407: ------------------------------------ Assignee: Apache Spark > Simplify the exception message from Python UDFs > ----------------------------------------------- > > Key: SPARK-33407 > URL: https://issues.apache.org/jira/browse/SPARK-33407 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.1.0 > Reporter: Hyukjin Kwon > Assignee: Apache Spark > Priority: Major > > Currently, the exception message is as below: > {code} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../python/pyspark/sql/dataframe.py", line 427, in show > print(self._jdf.showString(n, 20, vertical)) > File "/.../python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, > in __call__ > File "/.../python/pyspark/sql/utils.py", line 127, in deco > raise_from(converted) > File "<string>", line 3, in raise_from > pyspark.sql.utils.PythonException: > An exception was thrown from Python worker in the executor: > Traceback (most recent call last): > File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 605, in main > process() > File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 597, in process > serializer.dump_stream(out_iter, outfile) > File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 223, in > dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 141, in > dump_stream > for obj in iterator: > File "/.../python/lib/pyspark.zip/pyspark/serializers.py", line 212, in > _batched > for item in iterator: > File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 450, in mapper > result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in > udfs) > File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 450, in <genexpr> > result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in > udfs) > File "/.../python/lib/pyspark.zip/pyspark/worker.py", line 90, in <lambda> > return lambda *a: f(*a) > File "/.../python/lib/pyspark.zip/pyspark/util.py", line 107, in wrapper > return f(*args, **kwargs) > File "<stdin>", line 3, in divide_by_zero > ZeroDivisionError: division by zero > {code} > Actually, almost all cases, users only care about {{ZeroDivisionError: > division by zero > }}. We don't really have to show the internal stuff in 99% cases. > We could just make it short, for example, > {code} > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../python/pyspark/sql/dataframe.py", line 427, in show > print(self._jdf.showString(n, 20, vertical)) > File "/.../python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, > in __call__ > File "/.../python/pyspark/sql/utils.py", line 127, in deco > raise_from(converted) > File "<string>", line 3, in raise_from > pyspark.sql.utils.PythonException: > An exception was thrown from Python worker in the executor: > Traceback (most recent call last): > File "<stdin>", line 3, in divide_by_zero > ZeroDivisionError: division by zero > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org