Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 )
Change subject: IMPALA-9979: part 2: partitioned top-n ...................................................................... Patch Set 28: (7 comments) http://gerrit.cloudera.org:8080/#/c/16242/28//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16242/28//COMMIT_MSG@59 PS28, Line 59: and the tie-handling : semantics required by rank() predicates nit: I think this was really implemented in your previous patch? http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.h File be/src/exec/topn-node.h: http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.h@64 PS28, Line 64: int64_t limit = is_partitioned() ? per_partition_limit() : What's the relationship between 'include_ties' and 'is_partitioned', i.e. why does 'include_ties' here matter for the unpartitioned case but not the partitioned case? http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.cc@244 PS28, Line 244: U typo http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.cc@399 PS28, Line 399: RETURN_IF_ERROR(QueryMaintenance(state)); This results in two calls to QueryMaintenance() in quick succession, here and in GetNext(), might be better to avoid that http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.cc@566 PS28, Line 566: be typo http://gerrit.cloudera.org:8080/#/c/16242/28/be/src/exec/topn-node.cc@666 PS28, Line 666: vector<unique_ptr<Heap>> rematerialized_heaps; : for (auto& entry : partition_heaps_) { : RETURN_IF_ERROR(entry.second->RematerializeTuples(this, state, temp_pool.get())); : DCHECK(entry.second->DCheckConsistency()); : // The key references memory in 'tuple_pool_'. Replace it with a rematerialized tuple. : rematerialized_heaps.push_back(move(entry.second)); : } : partition_heaps_.clear(); : for (auto& heap_ptr : rematerialized_heaps) { : const Tuple* key_tuple = heap_ptr->top(); : partition_heaps_.emplace(key_tuple, move(heap_ptr)); : } I think this can be put in an 'else' with the above 'if (heap_ != nullptr)' to make the partitioned vs. unpartitioned handling clearer http://gerrit.cloudera.org:8080/#/c/16242/28/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/16242/28/common/thrift/ImpalaService.thrift@625 PS28, Line 625: // If > 0, the rank()/row_number() pushdown into pre-analytic sorts is enabled Maybe note the default value, and briefly the issues with setting it higher. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 28 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 02 Feb 2021 00:25:09 +0000 Gerrit-HasComments: Yes