Yicong-Huang opened a new pull request, #55244:
URL: https://github.com/apache/spark/pull/55244

   ### What changes were proposed in this pull request?
   
   Add ASV microbenchmarks for `SQL_COGROUPED_MAP_ARROW_UDF` in 
`bench_eval_type.py`.
   
   Changes:
   - Add `_CogroupedMapArrowBenchMixin` with three UDF variants: 
`identity_udf`, `concat_udf`, `left_semi_udf`
   - Add `CogroupedMapArrowUDFTimeBench` and `CogroupedMapArrowUDFPeakmemBench` 
classes
   - Add `MockDataFactory.make_cogrouped_batches()` factory for generating 
cogroup batch pairs (left, right)
   - Rename `make_batch_groups` to `make_grouped_batches` for consistency
   
   ### Why are the changes needed?
   
   Part of SPARK-55724 (Micro-benchmark PySpark Eval Types). This provides a 
performance baseline for `SQL_COGROUPED_MAP_ARROW_UDF` before refactoring.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   ASV benchmark run with `repeat=(3, 5, 10.0)`:
   
   ```
   CogroupedMapArrowUDFTimeBench.time_worker
   ================ =============== ============
       scenario           udf
   ---------------- --------------- ------------
    few_groups_sm     identity_udf   13.4±0.1ms
    few_groups_sm      concat_udf    16.5±0.2ms
    few_groups_sm    left_semi_udf    70.4±1ms
    few_groups_lg     identity_udf   53.8±0.2ms
    few_groups_lg      concat_udf    83.3±0.9ms
    few_groups_lg    left_semi_udf    222±6ms
    many_groups_sm    identity_udf   393±0.7ms
    many_groups_sm     concat_udf     513±1ms
    many_groups_sm   left_semi_udf   1.67±0.01s
    many_groups_lg    identity_udf    200±4ms
    many_groups_lg     concat_udf     265±1ms
    many_groups_lg   left_semi_udf    997±50ms
     wide_values      identity_udf    308±1ms
     wide_values       concat_udf     394±2ms
     wide_values     left_semi_udf    635±10ms
      multi_key       identity_udf   75.1±0.2ms
      multi_key        concat_udf    105±0.6ms
      multi_key      left_semi_udf    233±2ms
   ================ =============== ============
   
   CogroupedMapArrowUDFPeakmemBench.peakmem_worker
   ================ =============== ======
       scenario           udf
   ---------------- --------------- ------
    few_groups_sm     identity_udf   483M
    few_groups_sm      concat_udf    488M
    few_groups_sm    left_semi_udf   482M
    few_groups_lg     identity_udf   682M
    few_groups_lg      concat_udf    741M
    few_groups_lg    left_semi_udf   715M
    many_groups_sm    identity_udf   559M
    many_groups_sm     concat_udf    579M
    many_groups_sm   left_semi_udf   549M
    many_groups_lg    identity_udf   845M
    many_groups_lg     concat_udf    955M
    many_groups_lg   left_semi_udf   870M
     wide_values      identity_udf   810M
     wide_values       concat_udf    919M
     wide_values     left_semi_udf   772M
      multi_key       identity_udf   572M
      multi_key        concat_udf    593M
      multi_key      left_semi_udf   586M
   ================ =============== ======
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to