Yicong Huang created SPARK-54531:
------------------------------------
Summary: Separate Aggregation Pandas UDF Serializer
Key: SPARK-54531
URL: https://issues.apache.org/jira/browse/SPARK-54531
Project: Spark
Issue Type: Task
Components: PySpark
Affects Versions: 4.1.0
Reporter: Yicong Huang
Currently, `SQL_GROUPED_AGG_PANDAS_UDF` and `SQL_WINDOW_AGG_PANDAS_UDF` share
`GroupPandasUDFSerializer` with `SQL_GROUPED_MAP_PANDAS_UDF`, but they have
fundamentally different semantics. Aggregation UDFs return `(pandas.Series,
arrow_type)` tuples and support multi-UDF execution, while grouped map UDFs
return `[(pandas.DataFrame, arrow_type)]` lists and do not support multi-UDF.
The shared serializer requires complex branching logic to handle these
different return formats and execution patterns, making the code harder to
maintain and understand.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]