Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16373 )
Change subject: IMPALA-4065 Inline comparator calls into TopN::InsertBatch() ...................................................................... Patch Set 9: (24 comments) Had a bunch of style comments, mostly driven by our style guide and existing best practices. I expected to see some changes to the codegen logic or perf numbers to actually replace the indirect comparator calls in the LLVM IR with a direct call to the codegen'd function. Is there going to be a part 2 that does that? http://gerrit.cloudera.org:8080/#/c/16373/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16373/9//COMMIT_MSG@19 PS9, Line 19: Did you do any performance comparison? Would be nice to have a targeted query that stresses the operator. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node-ir.cc@58 PS9, Line 58: priority_queue_.Heapify(); I'm a little confused by this - isn't heapify generally an O(n) operation? http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.h@188 PS9, Line 188: private : nit: bad whitespace change http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@128 PS9, Line 128: if (codegen_status.ok()) { I would have expected to see some changes here to replace the comparator call with a codegen'd version. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@273 PS9, Line 273: priority_queue_.Pop(tuple); This interface is an improvement! http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@312 PS9, Line 312: : capacity_(capacity), priority_queue_(c, capacity) {} Not the biggest deal, but this preallocates the full array whereas previously it relied on the vector to expand. Might be a small difference in memory management for larger values of N. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt File be/src/util/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt@151 PS9, Line 151: priority-queue-test.cc nit: can you preserve the alphabetic ordering of the files here? http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt@221 PS9, Line 221: ADD_UNIFIED_BE_LSAN_TEST(priority-queue-test "PriorityQueueTest.*") Could preserve order here too http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/comparator-wrapper.h File be/src/util/comparator-wrapper.h: PS9: I think we can maybe just delete this wrapper - it was only required to adapt TupleRowComparator to the C++ STL interface, and I think we can now just call it directly. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/comparator-wrapper.h@19 PS9, Line 19: #ifndef IMPALA_UTIL_COMPARATOR_WRAPPER_H_ Not a big deal, but we prefer #pragma once in new code instead of traditional include guards. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue-test.cc File be/src/util/priority-queue-test.cc: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue-test.cc@33 PS9, Line 33: testInt nit: TestInt http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h File be/src/util/priority-queue.h: http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@19 PS9, Line 19: #define IMPALA_UTIL_PRIORITY_QUEUE_H We prefer #pragma once http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@81 PS9, Line 81: PriorityQueue(ComparatorWrapper<C> c, std::size_t n = 100) Can't we pass in TupleRowComparator directly? ComparatorWrapper was just used to adapt TupleRowComparator to the c++ stl interface but I don't think we need that here. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@96 PS9, Line 96: inline T& Top() { return elements_[0]; } nit: not a big deal, but I generally recommend omitting inline from functions defined in the class body cause it's already implied. https://en.cppreference.com/w/cpp/language/inline "A function defined entirely inside a class/struct/union definition, whether it's a member function or a non-member friend function, is implicitly an inline function. " http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@105 PS9, Line 105: capacity_ *= 2; I'd prefer to use a vector<> so that we don't need to reimplement the doubling behaviour. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@122 PS9, Line 122: void Pop(T& v) { Why not just return the value? It's just a pointer so copy vs move etc doesn't matter. You could also change it to the following if you wanted to avoid a hypothetical copy if T was a different type. size_--; return move(elements_[last]); http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@128 PS9, Line 128: Heapify(0, last); Isn't heapify generally an O(n) operation? Popping off a heap should be O(log n) and it was before. Am I missing something? http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@139 PS9, Line 139: // high_bound-1. I would have expected a function called heapify to take an arbitrary array and turn it into a valid heap, e.g. http://www.cplusplus.com/reference/algorithm/make_heap/ It looks like this is doing something else - sifting down the element at i? http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@146 PS9, Line 146: DCHECK DCHECK_LE since it will print values on failure http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@166 PS9, Line 166: break; else should have braces http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@168 PS9, Line 168: } nit: here and elsewhere, missing blank line between function definitions http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@204 PS9, Line 204: std::size_t capacity_; We use int64_t for sizes. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 https://google.github.io/styleguide/cppguide.html#Integer_Types http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@207 PS9, Line 207: // The array holding the elements. We prefer a unique_ptr<T[]> or vector<T> to avoid need for explicit memory management. https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala has some context about the preferred ways to manage memory. http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@209 PS9, Line 209: compartor comparator -- To view, visit http://gerrit.cloudera.org:8080/16373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I676b4c05cf10a6946c05e317b0002c1e29e78aa8 Gerrit-Change-Number: 16373 Gerrit-PatchSet: 9 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 02 Sep 2020 19:11:47 +0000 Gerrit-HasComments: Yes