[ 
https://issues.apache.org/jira/browse/IMPALA-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2875:
----------------------------------
    Priority: Major  (was: Critical)

> Optimize subplans when the following plan nodes do not require parent rows.
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-2875
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2875
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.3.0
>            Reporter: Alexander Behm
>            Priority: Major
>              Labels: nested_types, performance, planner
>
> Consider the following query that references nested collections and its plan:
> Query:
> {code}
> select count(*) from tpch_nested_parquet.customer c, c.c_orders.o_lineitems l
> where c.c_mktsegment = "AUTOMOBILE"
> group by l.l_returnflag
> {code}
> Plan:
> {code}
> +------------------------------------------------------------------------------------+
> | Explain String                                                              
>        |
> +------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=304.00MB VCores=2                   
>        |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | tpch_nested_parquet.customer                                                
>        |
> |                                                                             
>        |
> | 08:EXCHANGE [UNPARTITIONED]                                                 
>        |
> | |                                                                           
>        |
> | 07:AGGREGATE [FINALIZE]                                                     
>        |
> | |  output: count:merge(*)                                                   
>        |
> | |  group by: l.l_returnflag                                                 
>        |
> | |                                                                           
>        |
> | 06:EXCHANGE [HASH(l.l_returnflag)]                                          
>        |
> | |                                                                           
>        |
> | 05:AGGREGATE                                                                
>        |
> | |  output: count(*)                                                         
>        |
> | |  group by: l.l_returnflag                                                 
>        |
> | |                                                                           
>        |
> | 01:SUBPLAN                                                                  
>        |
> | |                                                                           
>        |
> | |--04:NESTED LOOP JOIN [CROSS JOIN]                                         
>        |
> | |  |                                                                        
>        |
> | |  |--02:SINGULAR ROW SRC                                                   
>        |
> | |  |                                                                        
>        |
> | |  03:UNNEST [c.c_orders.o_lineitems l]                                     
>        |
> | |                                                                           
>        |
> | 00:SCAN HDFS [tpch_nested_parquet.customer c]                               
>        |
> |    partitions=1/1 files=4 size=554.13MB                                     
>        |
> |    predicates: c.c_mktsegment = 'AUTOMOBILE'                                
>        |
> +------------------------------------------------------------------------------------+
> {code}
> In execution, we spend a lot of time evaluating and resetting the nested-loop 
> join.
> However, for this query the plan after the subplan node does not need the 
> parent rows at all, so we could improve this query by only having an unnest 
> node inside the subplan.
> This optimization is a special case of projection trimming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to