Yicong Huang created SPARK-56424:
------------------------------------

             Summary: Benchmark SQL_SCALAR_PANDAS_UDF
                 Key: SPARK-56424
                 URL: https://issues.apache.org/jira/browse/SPARK-56424
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


Benchmark scalar Pandas UDF: measure Arrow-to-Pandas-to-Arrow conversion for 
single-batch pandas_udf execution with Series in/out.

This is part of the PySpark Serializer & EvalType Refactor project 
(SPARK-55724). The benchmark will cover the {{SQL_SCALAR_PANDAS_UDF}} eval type 
in {{python/benchmarks/bench_eval_type.py}}.

h3. Scope

* Add ASV microbenchmark for {{SQL_SCALAR_PANDAS_UDF}} (eval type 200)
* Measure the full Arrow-to-Pandas-to-Arrow round-trip: 
{{ArrowStreamPandasUDFSerializer.load_stream}} -> pandas Series -> UDF 
execution -> {{ArrowStreamPandasUDFSerializer.dump_stream}}
* Pure Python, no JVM required (stream protocol simulation)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to