Hi

*Tl;Dr:* I have a scenario where I generate code string on fly and execute
that code, now for me if an error occurs I need the traceback but for
executable code I just get partial traceback i.e. the line which caused the
error is missing.

Consider below MRC:
def fun():
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col, udf
    from pyspark.sql.types import StringType

    spark = SparkSession.builder.appName("some_name").getOrCreate()

    columns = ["Seqno", "Name"]
    data = [("1", "john jones"), ("2", "tracey smith"), ("3", "amy sanders"
)]

    df = spark.createDataFrame(data=data, schema=columns)

    def errror_func(str):
        def internal_error_method():
            raise RuntimeError

        return internal_error_method()

    # Converting function to UDF
    errror_func_udf = udf(lambda z: errror_func(z), StringType())

    df.select(col("Seqno"), errror_func_udf(col("Name")).alias("Name")).show
(truncate=False)

fun()


This gives below shown Traceback, (Notice we are also getting the line
content that caused error

> Traceback (most recent call last):
>
>   File "temp.py", line 28, in <module>
>
>     fun()
>
>   File "temp.py", line 25, in fun
>
>     df.select(col("Seqno"),
>> errror_func_udf(col("Name")).alias("Name")).show(truncate=False)
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/pyspark/sql/dataframe.py",
>> line 502, in show
>
>     print(self._jdf.showString(n, int_truncate, vertical))
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/py4j/java_gateway.py",
>> line 1321, in __call__
>
>     return_value = get_return_value(
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/pyspark/sql/utils.py",
>> line 117, in deco
>
>     raise converted from None
>
> pyspark.sql.utils.PythonException:
>
>   An exception was thrown from the Python worker. Please see the stack
>> trace below.
>
> Traceback (most recent call last):
>
>   File "temp.py", line 23, in <lambda>
>
>     errror_func_udf = udf(lambda z: errror_func(z), StringType())
>
>   File "temp.py", line 20, in errror_func
>
>     return internal_error_method()
>
>   File "temp.py", line 18, in internal_error_method
>
>     raise RuntimeError
>
> RuntimeError
>
>
>
But now if i run the same code by doing an exec i loose the traceback line
content although line number is there
import linecache

code = """
def fun():
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col, udf
    from pyspark.sql.types import StringType

    spark = SparkSession.builder.appName("some_name").getOrCreate()

    columns = ["Seqno", "Name"]
    data = [("1", "john jones"), ("2", "tracey smith"), ("3", "amy
sanders")]

    df = spark.createDataFrame(data=data, schema=columns)

    def errror_func(str):
        def internal_error_method():
            raise RuntimeError

        return internal_error_method()

    # Converting function to UDF
    errror_func_udf = udf(lambda z: errror_func(z), StringType())

    df.select(col("Seqno"),
errror_func_udf(col("Name")).alias("Name")).show(truncate=False)
"""


scope = {}
filename = "<tmpfile-q231231>"
compiled_code = compile(code, filename, "exec")
if filename not in linecache.cache:
    linecache.cache[filename] = (
        len(scope),
        None,
        code.splitlines(keepends=True),
        filename,
    )
exec(compiled_code, scope, scope)
fun = scope["fun"]

fun()


Traceback of this code is

> Traceback (most recent call last):
>
>   File "temp.py", line 74, in <module>
>
>     fun()
>
>   File "<tmpfile-q231231>", line 23, in fun
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/pyspark/sql/dataframe.py",
>> line 502, in show
>
>     print(self._jdf.showString(n, int_truncate, vertical))
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/py4j/java_gateway.py",
>> line 1321, in __call__
>
>     return_value = get_return_value(
>
>   File
>> "/home/indivar/corridor/code/corridor-platforms/venv/lib/python3.8/site-packages/pyspark/sql/utils.py",
>> line 117, in deco
>
>     raise converted from None
>
> pyspark.sql.utils.PythonException:
>
>   An exception was thrown from the Python worker. Please see the stack
>> trace below.
>
> Traceback (most recent call last):
>
>   File "<tmpfile-q231231>", line 21, in <lambda>
>
>   File "<tmpfile-q231231>", line 18, in errror_func
>
>   File "<tmpfile-q231231>", line 16, in internal_error_method
>
> RuntimeError
>
>
> As you can see this has missing line content.

initially i thought this was a python issue, so i tried to do some reading,
python internally seems to be using linecache module to get content of
line, now when doing exec uptill python 3.12 python also had same issue
which they have fixed in python 3.13 [issue ref for details]: Support
multi-line error locations in traceback and other related improvements
(PEP-657, 3.11) · Issue #106922 · python/cpython (github.com)
<https://github.com/python/cpython/issues/106922>
and it was a known issue for me also so I was re-massaging the traceback
message using linecache which works with simple python definitions as I
explicitly update linecache while creating exec.

But it seems when i create a Udf and once execution steps inside the Udf
the linecache becomes empty ( i checked this by printing linecache.cache,
after every step in codestring above), due to which i am not able to get
the content of the line number from where the error originates.

I was wondering you can help with this
Other ref:
How can i pass linecache over an exec mthod local/global scope - Python
Help - Discussions on Python.org
<https://discuss.python.org/t/how-can-i-pass-linecache-over-an-exec-mthod-local-global-scope/51192/2>

Thanks,
Indivar

Reply via email to