Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17592 )
Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets efficiently. ...................................................................... Patch Set 7: (10 comments) Looks pretty good. My only concern is whether there is a significant performance degradation, and how much. http://gerrit.cloudera.org:8080/#/c/17592/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17592/7//COMMIT_MSG@69 PS7, Line 69: As a part of this patch a new Micro Benchmark for HashTable has : been introduced, which will help in measuring these: : 1. Runtime for Building Hash Table and Probing the table. : 2. Memory consumed after building the Table. : This would help measuring the impact of changes to the HashTable's : data structure and algorithm. nit. Nice addition! May be useful to include some results here. May also include some numbers from TPCDS here. http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h@653 PS7, Line 653: 0 nit. May define and use constant for MATCHED here. http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h@697 PS7, Line 697: 0 nit. same as above. May use constants for FILLED, MATCHED and DUPLICATED, instead of 0, 1 and 2 directly. http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h File be/src/util/tagged-ptr.h: http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@33 PS7, Line 33: /// bits. Tag bit 0 - corresponds to bit 63, bit 1 corresponds to 62 and so on. nit: may add: To get the address stored together with extra information in canonical form, one must set/reset all bits from 57 to 63. http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@71 PS7, Line 71: if (i > 6 || i < 0) nit. UNLIKELY()? http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@78 PS7, Line 78: if (i > 6 || i < 0) { same as above http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@91 PS7, Line 91: //return (T*)((data_ & MASK_0_56_BITS) | ~((data_ & MASK_56_BIT) - 1)); nit. delete? http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@101 PS7, Line 101: bool operator==(const TaggedPtr<T> &a) noexcept { return data_ == a.data_; } : : bool operator!=(const TaggedPtr<T> &a) noexcept { return data_ != a.data_; } nit. I found the implementation of these two operators counter intuitive as semantically, a comparison of two pointers means to compare the pointers, excluding any extra bits encoded within the pointers. Maybe add two new methods 1. bool CompareTaggedPtrEqual() 2. bool compareTaggedPtrNotEqual() http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@115 PS7, Line 115: / data_ = (dat nit. remove? http://gerrit.cloudera.org:8080/#/c/17592/7/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: http://gerrit.cloudera.org:8080/#/c/17592/7/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@47 PS7, Line 47: public final static double SIZE_OF_HASH = 4; nit. I wonder if this value can be folded back into SIZE_OF_BUCKET and SIZE_OF_DUPLICATENODE. That is to define directly SIZE_OF_BUCKET=12 SIZE_OF_DUPLICATENODE=20 as the separation of hash from bucket is an implementation optimization. -- To view, visit http://gerrit.cloudera.org:8080/17592 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b Gerrit-Change-Number: 17592 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 21 Jul 2021 15:55:17 +0000 Gerrit-HasComments: Yes