[ 
https://issues.apache.org/jira/browse/FLINK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-32281:
-----------------------------------
    Labels: pull-request-available  (was: )

> Enable two-phase HashAgg default when agg function support adaptive local 
> HashAgg
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-32281
>                 URL: https://issues.apache.org/jira/browse/FLINK-32281
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Planner
>            Reporter: dalongliu
>            Priority: Major
>              Labels: pull-request-available
>
> For the HashAgg operator, planner currently prefers a one-phase agg when the 
> statistic cannot be accurately estimated. In some queries of production 
> scenarios, it may be more reasonable to choose a two-phase agg. In the TPC-DS 
> cases, we find that for some patterns actually choosing two-stage agg, the 
> query runtime is significantly reduced. In 
> https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the 
> adaptive local hashagg, which can adaptively skip aggregation when the local 
> phase aggregation degree is relatively low, which can greatly improve the 
> performance of two-phase aggregation in some queries. Based on the above 
> background, in this issue, we propose to turn on two-phase agg by default for 
> functions that support adaptive local hashagg, such as sum/count/min/max, 
> etc., so as to exploit the ability of adpative local hashgg to improve the 
> performance of agg query. For OFCG, if we turn on two-phaseagg by default, we 
> can also let the local agg operator be put into the fused operator, so as to 
> enjoy the benefit from OFCG.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to