NEUpanning opened a new issue, #10367:
URL: https://github.com/apache/incubator-gluten/issues/10367

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   
   <details>
   
   <summary>Spark sql fragment that has 35 aggregate functions:</summary>
   
   SELECT
           supply_id,
           mt_main_poi_id,
           SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_coupon_cnt 
ELSE 0 END) AS trade_coupon_cnt_1d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_7d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_14d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_30d,
           SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_60d,
           SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_90d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN trade_coupon_cnt ELSE 0 END) / 7.0 AS trade_coupon_cnt_7d_average,
           SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_order_cnt 
ELSE 0 END) AS trade_order_cnt_1d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_7d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_14d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_30d,
           SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_60d,
           SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_90d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN trade_order_cnt ELSE 0 END) / 7.0 AS trade_order_cnt_7d_average,
           SUM(CASE WHEN partition_date = '2025-08-02' THEN gtv ELSE 0 END) AS 
gtv_1d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN gtv ELSE 0 END) AS gtv_7d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' 
THEN gtv ELSE 0 END) AS gtv_14d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' 
THEN gtv ELSE 0 END) AS gtv_30d,
           SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' 
THEN gtv ELSE 0 END) AS gtv_60d,
           SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' 
THEN gtv ELSE 0 END) AS gtv_90d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN gtv ELSE 0 END) / 7.0 AS gtv_7d_average,
           SUM(CASE WHEN partition_date = '2025-08-02' THEN order_fact_pay_amt 
ELSE 0 END) AS order_fact_pay_amt_1d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_7d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_14d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_30d,
           SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_60d,
           SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_90d,
           SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02' 
THEN order_fact_pay_amt ELSE 0 END) / 7.0 AS order_fact_pay_amt_7d_average,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date = 
'2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_1d,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN 
'2025-07-27' AND '2025-08-02' THEN partition_date ELSE NULL END) AS 
dx_pay_day_7d,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN 
'2025-07-20' AND '2025-08-02' THEN partition_date ELSE NULL END) AS 
dx_pay_day_14d,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN 
'2025-07-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS 
dx_pay_day_30d,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN 
'2025-06-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS 
dx_pay_day_60d,
           COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN 
'2025-05-05' AND '2025-08-02' THEN partition_date ELSE NULL END) AS 
dx_pay_day_90d
       FROM
           mart_aiosupply.daocan_app_aggr_supply_poi_deal_cube_d_extnl_view
       WHERE
           partition_date <= '2025-08-02'
           AND partition_date >= '2025-05-05'
           AND dim_type = 'poi#deal#type'
           AND product_type IN (1, 2)
       GROUP BY
           supply_id,
           mt_main_poi_id
   
   </details>
   
   vanilla spark aggregate metrics:
   ```
   HashAggregate
   
   spill size total (min, med, max (stageId: taskId))
   0.0 B (0.0 B, 0.0 B, 0.0 B (stage 6.0: task 15017))
   time in aggregation build total (min, med, max (stageId: taskId))
   192.60 h (6.1 m, 11.6 m, 19.6 m (stage 6.0: task 15015))
   peak memory total (min, med, max (stageId: taskId))
   965.3 GiB (987.5 MiB, 988.4 MiB, 988.4 MiB (stage 6.0: task 15322))
   number of output rows: 3,440,719,610
   avg hash probe bucket list iters (min, med, max (stageId: taskId)):
   (1.6, 1.6, 1.6 (stage 6.0: task 15017))
   ```
   
   gluten aggregate metrics that are 3.5x total time compared to vanilla:
   ```
   RegularHashAggregateExecTransformer
   
   time of extraction total (min, med, max (stageId: taskId))
   49.0 m (23 ms, 205 ms, 517 ms (stage 2.0: task 966))
   number of memory allocations: 614,976,915
   number of output vectors: 44,714,272
   number of spilled bytes total (min, med, max (stageId: taskId))
   1864.2 GiB (0.0 B, 131.6 MiB, 141.5 MiB (stage 2.0: task 491))
   number of output rows: 182,915,536,712
   number of output bytes total (min, med, max (stageId: taskId))
   101.6 TiB (1179.8 MiB, 7.2 GiB, 7.6 GiB (stage 2.0: task 491))
   number of spilled files: 612,736
   time of aggregation total (min, med, max (stageId: taskId))
   675.75 h (5.6 s, 2.8 m, 8.3 m (stage 2.0: task 9949))
   peak memory bytes total (min, med, max (stageId: taskId))
   53.4 TiB (1624.6 MiB, 3.8 GiB, 4.3 GiB (stage 2.0: task 12750))
   number of spilled rows: 182,905,342,963
   cpu wall time count: 478,741,203
   number of spilled partitions: 115,360
   extraction cpu wall time count: 402,803,446
   ```
   
   
   ### Gluten version
   
   Gluten-1.3
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to