[jira] [Created] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

Hyukjin Kwon (Jira) Thu, 02 May 2024 02:41:24 -0700

Hyukjin Kwon created SPARK-48087:
------------------------------------

             Summary: Python UDTF incompatibility in 3.5 client <> 4.0 server
                 Key: SPARK-48087
                 URL: https://issues.apache.org/jira/browse/SPARK-48087
             Project: Spark
          Issue Type: Sub-task
          Components: Connect, PySpark
    Affects Versions: 4.0.0
            Reporter: Hyukjin Kwon



{code}
======================================================================
FAIL [0.103s]: test_udtf_init_with_additional_args 
(pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
    func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
    self._check_result_or_exception(TestUDTF, ret_type, expected)
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", line 
598, in _check_result_or_exception
    with self.assertRaisesRegex(err_type, expected):
AssertionError: "AttributeError" does not match "
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1834, in main
    process()
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1826, in process
    serializer.dump_stream(out_iter, outfile)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 224, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 145, in dump_stream
    for obj in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", 
line 213, in _batched
    for item in iterator:
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1391, in mapper
    yield eval(*[a[o] for o in args_kwargs_offsets])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1371, in evaluate
    return tuple(map(verify_and_convert_result, res))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1340, in verify_and_convert_result
    return toInternal(result)
           ^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 1291, in toInternal
    return tuple(
           ^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 1292, in <genexpr>
    f.toInternal(v) if c else v
    ^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 907, in toInternal
    return self.dataType.toInternal(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
line 372, in toInternal
    calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
time.mktime(dt.timetuple())
            ..."
{code}

{code}

======================================================================
FAIL [0.096s]: test_udtf_init_with_additional_args 
(pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
    func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
                                               
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
946, in read_udtf
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the 
user-defined table function 'TestUDTF' because its constructor is invalid: the 
function does not implement the 'analyze' method, and its constructor has more 
than one argument (including the 'self' reference). Please update the table 
function so that its constructor accepts exactly one 'self' argument, and try 
the query again.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", line 
274, in test_udtf_init_with_additional_args
    with self.assertRaisesRegex(
AssertionError: "__init__\(\) missing 1 required positional argument: 'a'" does 
not match "
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
    func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
                                               
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
946, in read_udtf
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the 
user-defined table function 'TestUDTF' because its constructor is invalid: the 
function does not implement the 'analyze' method, and its constructor has more 
than one argument (including the 'self' reference). Please update the table 
function so that its constructor accepts exactly one 'self' argument, and try 
the query again.
"
{code}



{code}

======================================================================
FAIL [0.087s]: test_udtf_with_wrong_num_input 
(pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_wrong_num_input)
----------------------------------------------------------------------
pyspark.errors.exceptions.connect.PythonException: 
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
    func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
                                               
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1082, in read_udtf
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE] Failed to evaluate the 
user-defined table function 'TestUDTF' because the function arguments did not 
match the expected signature of the 'eval' method (missing a required argument: 
'a'). Please update the query so that this table function call provides 
arguments matching the expected signature, or else update the table function so 
that its 'eval' method accepts the provided arguments, and then try the query 
again.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", line 
255, in test_udtf_with_wrong_num_input
    with self.assertRaisesRegex(
AssertionError: "eval\(\) missing 1 required positional argument: 'a'" does not 
match "
  An exception was thrown from the Python worker. Please see the stack trace 
below.
Traceback (most recent call last):
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1816, in main
    func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
eval_type)
                                               
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 
1082, in read_udtf
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE] Failed to evaluate the 
user-defined table function 'TestUDTF' because the function arguments did not 
match the expected signature of the 'eval' method (missing a required argument: 
'a'). Please update the query so that this table function call provides 
arguments matching the expected signature, or else update the table function so 
that its 'eval' method accepts the provided arguments, and then try the query 
again.
"
----------------------------------------------------------------------
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

Reply via email to