[ 
https://issues.apache.org/jira/browse/IMPALA-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-14757:
-------------------------------------
    Description: 
set num_nodes=1;
with s as  (select l_shipdate, l_orderkey,  max(l_orderkey) over() maxkey from 
tpch_parquet.lineitem) select * from s  where maxkey = l_orderkey;

summary:
{code}
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| Operator     | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | 
Peak Mem  | Est. Peak Mem | Detail                |
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| F00:ROOT     | 1      | 1     | 22.67us  | 22.67us  |       |            | 
4.02 MB   | 21.75 MB      |                       |
| 02:SELECT    | 1      | 1     | 24.14ms  | 24.14ms  | 2     | 600.12K    | 
24.00 KB  | 0 B           |                       |
| 01:ANALYTIC  | 1      | 1     | 723.74ms | 723.74ms | 6.00M | 6.00M      | 
178.09 MB | 4.00 MB       |                       |
| 00:SCAN HDFS | 1      | 1     | 10.94ms  | 10.94ms  | 6.00M | 6.00M      | 
29.22 MB  | 160.00 MB     | tpch_parquet.lineitem |
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
{code}

The analytic node consumer 178MB vs the estimated 4MB.

Note that this query would be more efficient if it was rewritten to use scalar 
subquery the get the max.

  was:
set num_nodes=1;
with s as  (select l_shipdate, l_orderkey,  max(l_orderkey) over() maxkey from 
tpch_parquet.lineitem) select * from s  where maxkey = l_orderkey;

summary:
{code}
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| Operator     | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | 
Peak Mem  | Est. Peak Mem | Detail                |
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
| F00:ROOT     | 1      | 1     | 22.67us  | 22.67us  |       |            | 
4.02 MB   | 21.75 MB      |                       |
| 02:SELECT    | 1      | 1     | 24.14ms  | 24.14ms  | 2     | 600.12K    | 
24.00 KB  | 0 B           |                       |
| 01:ANALYTIC  | 1      | 1     | 723.74ms | 723.74ms | 6.00M | 6.00M      | 
178.09 MB | 4.00 MB       |                       |
| 00:SCAN HDFS | 1      | 1     | 10.94ms  | 10.94ms  | 6.00M | 6.00M      | 
29.22 MB  | 160.00 MB     | tpch_parquet.lineitem |
+--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
{code}

The analytic node consumer 178MB vs the estimated 4MB.



> Analytic functions' mem usage can be underestimated
> ---------------------------------------------------
>
>                 Key: IMPALA-14757
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14757
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> set num_nodes=1;
> with s as  (select l_shipdate, l_orderkey,  max(l_orderkey) over() maxkey 
> from tpch_parquet.lineitem) select * from s  where maxkey = l_orderkey;
> summary:
> {code}
> +--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
> | Operator     | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | 
> Peak Mem  | Est. Peak Mem | Detail                |
> +--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
> | F00:ROOT     | 1      | 1     | 22.67us  | 22.67us  |       |            | 
> 4.02 MB   | 21.75 MB      |                       |
> | 02:SELECT    | 1      | 1     | 24.14ms  | 24.14ms  | 2     | 600.12K    | 
> 24.00 KB  | 0 B           |                       |
> | 01:ANALYTIC  | 1      | 1     | 723.74ms | 723.74ms | 6.00M | 6.00M      | 
> 178.09 MB | 4.00 MB       |                       |
> | 00:SCAN HDFS | 1      | 1     | 10.94ms  | 10.94ms  | 6.00M | 6.00M      | 
> 29.22 MB  | 160.00 MB     | tpch_parquet.lineitem |
> +--------------+--------+-------+----------+----------+-------+------------+-----------+---------------+-----------------------+
> {code}
> The analytic node consumer 178MB vs the estimated 4MB.
> Note that this query would be more efficient if it was rewritten to use 
> scalar subquery the get the max.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to