Yicong Huang created SPARK-56424:
------------------------------------
Summary: Benchmark SQL_SCALAR_PANDAS_UDF
Key: SPARK-56424
URL: https://issues.apache.org/jira/browse/SPARK-56424
Project: Spark
Issue Type: Sub-task
Components: PySpark
Affects Versions: 4.2.0
Reporter: Yicong Huang
Benchmark scalar Pandas UDF: measure Arrow-to-Pandas-to-Arrow conversion for
single-batch pandas_udf execution with Series in/out.
This is part of the PySpark Serializer & EvalType Refactor project
(SPARK-55724). The benchmark will cover the {{SQL_SCALAR_PANDAS_UDF}} eval type
in {{python/benchmarks/bench_eval_type.py}}.
h3. Scope
* Add ASV microbenchmark for {{SQL_SCALAR_PANDAS_UDF}} (eval type 200)
* Measure the full Arrow-to-Pandas-to-Arrow round-trip:
{{ArrowStreamPandasUDFSerializer.load_stream}} -> pandas Series -> UDF
execution -> {{ArrowStreamPandasUDFSerializer.dump_stream}}
* Pure Python, no JVM required (stream protocol simulation)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]