Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 )
Change subject: IMPALA-9979: part 2: partitioned top-n ...................................................................... Patch Set 30: (15 comments) This is a complex optimization. Nice work on piecing it together! Sending the comments I have so far. Still have to go through the plans in more details. http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@49 PS30, Line 49: The nit: 'This' http://gerrit.cloudera.org:8080/#/c/16242/30//COMMIT_MSG@57 PS30, Line 57: both : the partitioning nit: ..and non-partitioned case ? http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h File be/src/exec/topn-node.h: http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.h@133 PS30, Line 133: partition are returned Would be good to add how the returned partitions are ordered among themselves...for instance whether this is a deterministic ascending order of partition keys. (update: I see below you have more details on, so maybe ok to ignore). http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@53 PS30, Line 53: the the nit: repetition 'the' http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@59 PS30, Line 59: the the nit: repetition 'the' http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@398 PS30, Line 398: RETURN_IF_CANCELLED(state); I was wondering if the inner while loop (line 418) should also have a cancellation check or is that an overkill considering the number of times inner loop is executed is expected to be small. http://gerrit.cloudera.org:8080/#/c/16242/30/be/src/exec/topn-node.cc@559 PS30, Line 559: // We evict heaps starting with the heaps that were least effective at filtering The eviction policy is interesting. Considering that a rank()/rownum() can only be computed after a partition has seen all its rows, the filtering would be blocked until then. So it seems all partitions will have equal weightage, no ? http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@279 PS30, Line 279: public long estimateTopNMaterializedSize(long totalRows) { Should this estimate account for any per-partition overhead (additional metadata needed per heap) ? http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@646 PS30, Line 646: limit on nit: incomplete sentence http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java: http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@453 PS30, Line 453: if (limit.limit > 1) { Maybe simplify this if-else to planNodeLimit = Math.max(limit.limit, 1) http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@461 PS30, Line 461: if (partitionByExprs.isEmpty()) { There was a recent issue with constant partition-by and order-by exprs (https://issues.apache.org/jira/browse/IMPALA-10473). This check could also consider such cases probably after the other issue is fixed such that we don't create partitioned top-n for constant partition-by. http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@885 PS30, Line 885: /// Whether the source predicate was a simple < or <= inequality. This sounded confusing at first ..I interpreted it to mean true for < (the variable name is also isLessThan) and false for <= . Maybe add the comment that this is false for = and true for both < and <= . http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@928 PS30, Line 928: to . nit: incomplete http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@1158 PS30, Line 1158: returns nit: 'returns ties' http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java File fe/src/main/java/org/apache/impala/planner/SelectNode.java: http://gerrit.cloudera.org:8080/#/c/16242/30/fe/src/main/java/org/apache/impala/planner/SelectNode.java@81 PS30, Line 81: selectNode.conjuncts_.addAll(conjuncts); We should check if the child's conjuncts being added are unique compared to what this selectNode already had (unless we can be sure it is getting enforced in the way conjuncts are migrated). -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 30 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 05 Feb 2021 07:48:37 +0000 Gerrit-HasComments: Yes