Yicong Huang created SPARK-55222:
------------------------------------

             Summary: Unify _create_batch with transformer composition
                 Key: SPARK-55222
                 URL: https://issues.apache.org/jira/browse/SPARK-55222
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


There are 3 `_create_batch` overrides in serializers.py:
- `ArrowStreamPandasSerializer._create_batch` (L470) — Series only
- `ArrowStreamPandasUDFSerializer._create_batch` (L601) — Series + DataFrame
- `ArrowStreamPandasUDTFSerializer._create_batch` (L900) — DataFrame only

Goals:
1. Merge into a single implementation
2. Use transformer composition instead of inline logic
3. Standardize input format

Approach:

For struct outputs (DataFrame → StructArray), compose existing transformers:
{code:python}
ArrowBatchTransformer.wrap_struct(
    PandasBatchTransformer.to_arrow(df, schema, ...)
).column(0)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to