[ https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pooja Nilangekar reassigned IMPALA-7791: ---------------------------------------- Assignee: Pooja Nilangekar > Aggregation Node memory estimates don't account for number of fragment > instances > -------------------------------------------------------------------------------- > > Key: IMPALA-7791 > URL: https://issues.apache.org/jira/browse/IMPALA-7791 > Project: IMPALA > Issue Type: Sub-task > Affects Versions: Impala 3.1.0 > Reporter: Pooja Nilangekar > Assignee: Pooja Nilangekar > Priority: Major > > AggregationNode's memory estimates are calculated based on the input > cardinality of the node, without accounting for the division of input data > across fragment instances. This results in very high memory estimates. In > reality, the nodes often use only a part of this memory. > Example query: > {code:java} > [localhost:21000] default> select distinct * from tpch.lineitem limit 5; > {code} > Summary: > {code:java} > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem > | Est. Peak Mem | Detail > > > > > | > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | 04:EXCHANGE | 1 | 21.24us | 21.24us | 5 | 5 | 48.00 KB > | 16.00 KB | UNPARTITIONED > > > > > | > | 03:AGGREGATE | 3 | 5.11s | 5.15s | 15 | 5 | 576.21 > MB | 1.62 GB | FINALIZE > > > > > | > | 02:EXCHANGE | 3 | 709.75ms | 728.91ms | 6.00M | 6.00M | 5.46 MB > | 10.78 MB | > HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment) > | > | 01:AGGREGATE | 3 | 4.37s | 4.70s | 6.00M | 6.00M | 36.77 MB > | 1.62 GB | STREAMING > > > > > | > | 00:SCAN HDFS | 3 | 437.14ms | 480.60ms | 6.00M | 6.00M | 65.51 MB > | 264.00 MB | tpch.lineitem > > > > > | > +--------------+--------+----------+----------+-------+------------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > {code} > The plan estimates 3.50 GB memory per host but the query ends up with a peak > memory usage of 682.07 MB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org