Yicong-Huang opened a new pull request, #55674: URL: https://github.com/apache/spark/pull/55674
### What changes were proposed in this pull request? Refactor `SQL_COGROUPED_MAP_PANDAS_UDF` to be self-contained in `read_udfs()`, moving logic from `CogroupPandasUDFSerializer` into `worker.py`, using `ArrowStreamCoGroupSerializer` as pure I/O. ### Why are the changes needed? Part of SPARK-55388 (Refactor PythonEvalType processing logic). Making each eval type self-contained in `read_udfs()` improves readability and makes it easier to reason about the data flow for each eval type independently. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. No behavior change. ASV benchmark (`CogroupedMapPandasUDFTimeBench`), 3 runs each, averaged: ```text master: ebfdf8b6b4 vs PR: 87d96ebe40 Time (ms, lower = better) scenario udf master PR diff --------------------------------------------------------------------- few_groups_sm identity_udf 160.3 159.0 -0.81% few_groups_sm concat_udf 181.0 181.0 0.00% few_groups_sm left_semi_udf 182.7 181.7 -0.55% few_groups_sm key_identity_udf 160.3 162.0 +1.04% few_groups_lg identity_udf 413.3 411.3 -0.48% few_groups_lg concat_udf 545.7 537.3 -1.54% few_groups_lg left_semi_udf 508.3 500.7 -1.51% few_groups_lg key_identity_udf 412.3 412.3 0.00% many_groups_sm identity_udf 1356.7 1356.7 0.00% many_groups_sm concat_udf 1480.0 1473.3 -0.45% many_groups_sm left_semi_udf 1496.7 1503.3 +0.44% many_groups_sm key_identity_udf 1370.0 1363.3 -0.49% many_groups_lg identity_udf 765.3 760.7 -0.61% many_groups_lg concat_udf 922.7 908.7 -1.52% many_groups_lg left_semi_udf 887.0 886.0 -0.11% many_groups_lg key_identity_udf 764.7 767.0 +0.30% wide_values identity_udf 1073.3 1073.3 0.00% wide_values concat_udf 1276.7 1266.7 -0.78% wide_values left_semi_udf 1130.0 1130.0 0.00% wide_values key_identity_udf 1080.0 1073.3 -0.62% multi_key identity_udf 425.3 423.7 -0.39% multi_key concat_udf 473.7 473.0 -0.14% multi_key left_semi_udf 463.0 451.3 -2.52% multi_key key_identity_udf 430.3 428.3 -0.46% --------------------------------------------------------------------- SUM 17995.0 17834.0 -0.90% ``` Aggregate essentially flat (-0.90%); per-scenario variation within run-to-run noise. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
