[ https://issues.apache.org/jira/browse/IMPALA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-2783. ----------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed Fixed with Commit b42c64993d46893488a667fb9c425548fdf964ab in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b42c649 ] IMPALA-9979: part 2: partitioned top-n Planner changes: --------------- The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. Identify predicates that reference the analytic tuple and could be converted to limits. If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. The pushed predicate is still added to the SELECT node because it is necessary for correctness for predicates like '=' to filter additional rows and also the limit pushdown optimization looks for analytic predicates there, so retaining all predicates simplifies that. The selectivity of the predicate is adjusted so that cardinality estimates remain accurate. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enabled for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps (since updating the heap for every input row ends up being more expensive than doing a traditional sort). We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. This interacts with the limit pushdwon optimization. The limit pushdown optimization is applied after the partitioned top-n is generated, and can sometimes result in more optimal plans, so it is generalized to handle pushing into partitioned top-n nodes. Backend changes: --------------- The top-n node in the backend is augmented to handle the partitioned case, for which we use a std::map and a comparator based on the partition exprs. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. The current implementation tries to evict heaps that are less effective at filtering rows. Limitations: ----------- There are several possible extensions to this that we did not do: dense_rank() is not supported because it would require additional backend support - IMPALA-10014. ntile() is not supported because it would require additional backend support - IMPALA-10174. Only one predicate per analytic is pushed. Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. Lower bounds are not converted into OFFSET. The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. This doesn't push predicates into UNION - IMPALA-10013 Always false predicates don't result in empty plan - IMPALA-10015 Tests: Planner tests - added tests that exercise the interesting code paths added in planning. Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. Modified SORT targeted perf tests to avoid conversion to Top-N Added targeted perf test for partitioned top-n. End-to-end tests Unpartitioned Top-N end-to-end tests Basic partitioning and duplicate handling tests on functional Similar basic tests on larger inputs from TPC-DS and with larger partition counts. I inspected the results and also ran the same tests with analytic_rank_pushdown_threshold=0 to confirm that the results were the same as with the full sort. Fallback to spilling sort. Perf: Added a targeted benchmark that goes from ~2s to ~1s with mt_dop=8 on TPC-H 30 on my desktop. Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Reviewed-on: http://gerrit.cloudera.org:8080/16242 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Push down filters on rank similar to limit > ------------------------------------------ > > Key: IMPALA-2783 > URL: https://issues.apache.org/jira/browse/IMPALA-2783 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.2 > Reporter: Mostafa Mokhtar > Assignee: Aman Sinha > Priority: Minor > Labels: performance, planner, ramp-up > Fix For: Impala 4.0 > > > Similar to limit push down optimization we should extend the rule to cover > filters on Rank(), dense_rank() etc... as users tend to have explicit filters > on RANK() > Query > {code} > select * > FROM (SELECT Rank() > OVER( > ORDER BY l_orderkey) AS rank > FROM lineitem > WHERE l_shipdate < '1992-05-09') a > WHERE rank < 10 > {code} > Plan > {code} > +--------------------------------------------------------------+ > | Explain String | > +--------------------------------------------------------------+ > | Estimated Per-Host Requirements: Memory=512.00MB VCores=1 | > | | > | 03:SELECT | > | | predicates: rank() < 10 | > | | hosts=9 per-host-mem=unavailable | > | | tuple-ids=6,5 row-size=50B cardinality=17999891 | > | | | > | 02:ANALYTIC | > | | functions: rank() | > | | order by: l_orderkey ASC | > | | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | > | | hosts=9 per-host-mem=unavailable | > | | tuple-ids=6,5 row-size=50B cardinality=179998909 | > | | | > | 04:MERGING-EXCHANGE [UNPARTITIONED] | > | | order by: l_orderkey ASC | > | | hosts=9 per-host-mem=unavailable | > | | tuple-ids=6 row-size=38B cardinality=179998909 | > | | | > | 01:SORT | > | | order by: l_orderkey ASC | > | | hosts=9 per-host-mem=336.00MB | > | | tuple-ids=6 row-size=38B cardinality=179998909 | > | | | > | 00:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM] | > | partitions=1/1 files=264 size=64.36GB | > | predicates: l_shipdate < '1992-05-09' | > | table stats: 1799989091 rows total | > | column stats: all | > | hosts=9 per-host-mem=176.00MB | > | tuple-ids=0 row-size=38B cardinality=179998909 | > +--------------------------------------------------------------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)