NEUpanning opened a new issue, #10367:
URL: https://github.com/apache/incubator-gluten/issues/10367
### Backend
VL (Velox)
### Bug description
<details>
<summary>Spark sql fragment that has 35 aggregate functions:</summary>
SELECT
supply_id,
mt_main_poi_id,
SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_coupon_cnt
ELSE 0 END) AS trade_coupon_cnt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) AS trade_coupon_cnt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN trade_coupon_cnt ELSE 0 END) / 7.0 AS trade_coupon_cnt_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN trade_order_cnt
ELSE 0 END) AS trade_order_cnt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) AS trade_order_cnt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN trade_order_cnt ELSE 0 END) / 7.0 AS trade_order_cnt_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN gtv ELSE 0 END) AS
gtv_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN gtv ELSE 0 END) AS gtv_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02'
THEN gtv ELSE 0 END) AS gtv_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02'
THEN gtv ELSE 0 END) AS gtv_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02'
THEN gtv ELSE 0 END) AS gtv_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02'
THEN gtv ELSE 0 END) AS gtv_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN gtv ELSE 0 END) / 7.0 AS gtv_7d_average,
SUM(CASE WHEN partition_date = '2025-08-02' THEN order_fact_pay_amt
ELSE 0 END) AS order_fact_pay_amt_1d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_7d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-20' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_14d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-04' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_30d,
SUM(CASE WHEN partition_date BETWEEN '2025-06-04' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_60d,
SUM(CASE WHEN partition_date BETWEEN '2025-05-05' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) AS order_fact_pay_amt_90d,
SUM(CASE WHEN partition_date BETWEEN '2025-07-27' AND '2025-08-02'
THEN order_fact_pay_amt ELSE 0 END) / 7.0 AS order_fact_pay_amt_7d_average,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date =
'2025-08-02' THEN partition_date ELSE NULL END) AS dx_pay_day_1d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN
'2025-07-27' AND '2025-08-02' THEN partition_date ELSE NULL END) AS
dx_pay_day_7d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN
'2025-07-20' AND '2025-08-02' THEN partition_date ELSE NULL END) AS
dx_pay_day_14d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN
'2025-07-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS
dx_pay_day_30d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN
'2025-06-04' AND '2025-08-02' THEN partition_date ELSE NULL END) AS
dx_pay_day_60d,
COUNT(DISTINCT CASE WHEN is_pay_dx = 1 AND partition_date BETWEEN
'2025-05-05' AND '2025-08-02' THEN partition_date ELSE NULL END) AS
dx_pay_day_90d
FROM
mart_aiosupply.daocan_app_aggr_supply_poi_deal_cube_d_extnl_view
WHERE
partition_date <= '2025-08-02'
AND partition_date >= '2025-05-05'
AND dim_type = 'poi#deal#type'
AND product_type IN (1, 2)
GROUP BY
supply_id,
mt_main_poi_id
</details>
vanilla spark aggregate metrics:
```
HashAggregate
spill size total (min, med, max (stageId: taskId))
0.0 B (0.0 B, 0.0 B, 0.0 B (stage 6.0: task 15017))
time in aggregation build total (min, med, max (stageId: taskId))
192.60 h (6.1 m, 11.6 m, 19.6 m (stage 6.0: task 15015))
peak memory total (min, med, max (stageId: taskId))
965.3 GiB (987.5 MiB, 988.4 MiB, 988.4 MiB (stage 6.0: task 15322))
number of output rows: 3,440,719,610
avg hash probe bucket list iters (min, med, max (stageId: taskId)):
(1.6, 1.6, 1.6 (stage 6.0: task 15017))
```
gluten aggregate metrics that are 3.5x total time compared to vanilla:
```
RegularHashAggregateExecTransformer
time of extraction total (min, med, max (stageId: taskId))
49.0 m (23 ms, 205 ms, 517 ms (stage 2.0: task 966))
number of memory allocations: 614,976,915
number of output vectors: 44,714,272
number of spilled bytes total (min, med, max (stageId: taskId))
1864.2 GiB (0.0 B, 131.6 MiB, 141.5 MiB (stage 2.0: task 491))
number of output rows: 182,915,536,712
number of output bytes total (min, med, max (stageId: taskId))
101.6 TiB (1179.8 MiB, 7.2 GiB, 7.6 GiB (stage 2.0: task 491))
number of spilled files: 612,736
time of aggregation total (min, med, max (stageId: taskId))
675.75 h (5.6 s, 2.8 m, 8.3 m (stage 2.0: task 9949))
peak memory bytes total (min, med, max (stageId: taskId))
53.4 TiB (1624.6 MiB, 3.8 GiB, 4.3 GiB (stage 2.0: task 12750))
number of spilled rows: 182,905,342,963
cpu wall time count: 478,741,203
number of spilled partitions: 115,360
extraction cpu wall time count: 402,803,446
```
### Gluten version
Gluten-1.3
### Spark version
Spark-3.5.x
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]