[ https://issues.apache.org/jira/browse/FLINK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-32281: ----------------------------------- Labels: pull-request-available (was: ) > Enable two-phase HashAgg default when agg function support adaptive local > HashAgg > --------------------------------------------------------------------------------- > > Key: FLINK-32281 > URL: https://issues.apache.org/jira/browse/FLINK-32281 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner > Reporter: dalongliu > Priority: Major > Labels: pull-request-available > > For the HashAgg operator, planner currently prefers a one-phase agg when the > statistic cannot be accurately estimated. In some queries of production > scenarios, it may be more reasonable to choose a two-phase agg. In the TPC-DS > cases, we find that for some patterns actually choosing two-stage agg, the > query runtime is significantly reduced. In > https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the > adaptive local hashagg, which can adaptively skip aggregation when the local > phase aggregation degree is relatively low, which can greatly improve the > performance of two-phase aggregation in some queries. Based on the above > background, in this issue, we propose to turn on two-phase agg by default for > functions that support adaptive local hashagg, such as sum/count/min/max, > etc., so as to exploit the ability of adpative local hashgg to improve the > performance of agg query. For OFCG, if we turn on two-phaseagg by default, we > can also let the local agg operator be put into the fused operator, so as to > enjoy the benefit from OFCG. -- This message was sent by Atlassian Jira (v8.20.10#820010)