[ https://issues.apache.org/jira/browse/IMPALA-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-2875: ---------------------------------- Priority: Major (was: Critical) > Optimize subplans when the following plan nodes do not require parent rows. > --------------------------------------------------------------------------- > > Key: IMPALA-2875 > URL: https://issues.apache.org/jira/browse/IMPALA-2875 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.3.0 > Reporter: Alexander Behm > Priority: Major > Labels: nested_types, performance, planner > > Consider the following query that references nested collections and its plan: > Query: > {code} > select count(*) from tpch_nested_parquet.customer c, c.c_orders.o_lineitems l > where c.c_mktsegment = "AUTOMOBILE" > group by l.l_returnflag > {code} > Plan: > {code} > +------------------------------------------------------------------------------------+ > | Explain String > | > +------------------------------------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=304.00MB VCores=2 > | > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | tpch_nested_parquet.customer > | > | > | > | 08:EXCHANGE [UNPARTITIONED] > | > | | > | > | 07:AGGREGATE [FINALIZE] > | > | | output: count:merge(*) > | > | | group by: l.l_returnflag > | > | | > | > | 06:EXCHANGE [HASH(l.l_returnflag)] > | > | | > | > | 05:AGGREGATE > | > | | output: count(*) > | > | | group by: l.l_returnflag > | > | | > | > | 01:SUBPLAN > | > | | > | > | |--04:NESTED LOOP JOIN [CROSS JOIN] > | > | | | > | > | | |--02:SINGULAR ROW SRC > | > | | | > | > | | 03:UNNEST [c.c_orders.o_lineitems l] > | > | | > | > | 00:SCAN HDFS [tpch_nested_parquet.customer c] > | > | partitions=1/1 files=4 size=554.13MB > | > | predicates: c.c_mktsegment = 'AUTOMOBILE' > | > +------------------------------------------------------------------------------------+ > {code} > In execution, we spend a lot of time evaluating and resetting the nested-loop > join. > However, for this query the plan after the subplan node does not need the > parent rows at all, so we could improve this query by only having an unnest > node inside the subplan. > This optimization is a special case of projection trimming. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org