[
https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-48087.
---------------------------------
> Python UDTF incompatibility in 3.5 client <> 4.0 server
> -------------------------------------------------------
>
> Key: SPARK-48087
> URL: https://issues.apache.org/jira/browse/SPARK-48087
> Project: Spark
> Issue Type: Sub-task
> Components: Connect, PySpark
> Affects Versions: 4.0.0
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ======================================================================
> FAIL [0.103s]: test_udtf_init_with_additional_args
> (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
> ----------------------------------------------------------------------
> pyspark.errors.exceptions.connect.PythonException:
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile,
> eval_type)
> self._check_result_or_exception(TestUDTF, ret_type, expected)
> File
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py",
> line 598, in _check_result_or_exception
> with self.assertRaisesRegex(err_type, expected):
> AssertionError: "AttributeError" does not match "
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1834, in main
> process()
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
> line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
> line 145, in dump_stream
> for obj in iterator:
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
> line 213, in _batched
> for item in iterator:
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1391, in mapper
> yield eval(*[a[o] for o in args_kwargs_offsets])
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1371, in evaluate
> return tuple(map(verify_and_convert_result, res))
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1340, in verify_and_convert_result
> return toInternal(result)
> ^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py",
> line 1291, in toInternal
> return tuple(
> ^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py",
> line 1292, in <genexpr>
> f.toInternal(v) if c else v
> ^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py",
> line 907, in toInternal
> return self.dataType.toInternal(obj)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py",
> line 372, in toInternal
> calendar.timegm(dt.utctimetuple()) if dt.tzinfo else
> time.mktime(dt.timetuple())
> ..."
> {code}
> {code}
> ======================================================================
> FAIL [0.096s]: test_udtf_init_with_additional_args
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
> ----------------------------------------------------------------------
> pyspark.errors.exceptions.connect.PythonException:
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile,
> eval_type)
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 946, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError:
> [UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the
> user-defined table function 'TestUDTF' because its constructor is invalid:
> the function does not implement the 'analyze' method, and its constructor has
> more than one argument (including the 'self' reference). Please update the
> table function so that its constructor accepts exactly one 'self' argument,
> and try the query again.
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py",
> line 274, in test_udtf_init_with_additional_args
> with self.assertRaisesRegex(
> AssertionError: "__init__\(\) missing 1 required positional argument: 'a'"
> does not match "
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile,
> eval_type)
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 946, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError:
> [UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD] Failed to evaluate the
> user-defined table function 'TestUDTF' because its constructor is invalid:
> the function does not implement the 'analyze' method, and its constructor has
> more than one argument (including the 'self' reference). Please update the
> table function so that its constructor accepts exactly one 'self' argument,
> and try the query again.
> "
> {code}
> {code}
> ======================================================================
> FAIL [0.087s]: test_udtf_with_wrong_num_input
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_with_wrong_num_input)
> ----------------------------------------------------------------------
> pyspark.errors.exceptions.connect.PythonException:
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile,
> eval_type)
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1082, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError:
> [UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE] Failed to evaluate the
> user-defined table function 'TestUDTF' because the function arguments did not
> match the expected signature of the 'eval' method (missing a required
> argument: 'a'). Please update the query so that this table function call
> provides arguments matching the expected signature, or else update the table
> function so that its 'eval' method accepts the provided arguments, and then
> try the query again.
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py",
> line 255, in test_udtf_with_wrong_num_input
> with self.assertRaisesRegex(
> AssertionError: "eval\(\) missing 1 required positional argument: 'a'" does
> not match "
> An exception was thrown from the Python worker. Please see the stack trace
> below.
> Traceback (most recent call last):
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile,
> eval_type)
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py",
> line 1082, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError:
> [UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE] Failed to evaluate the
> user-defined table function 'TestUDTF' because the function arguments did not
> match the expected signature of the 'eval' method (missing a required
> argument: 'a'). Please update the query so that this table function call
> provides arguments matching the expected signature, or else update the table
> function so that its 'eval' method accepts the provided arguments, and then
> try the query again.
> "
> ----------------------------------------------------------------------
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]