Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17592 )

Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets 
efficiently.
......................................................................


Patch Set 7:

(10 comments)

Looks pretty good.

My only concern is whether there is a significant performance degradation, and 
how much.

http://gerrit.cloudera.org:8080/#/c/17592/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17592/7//COMMIT_MSG@69
PS7, Line 69: As a part of this patch a new Micro Benchmark for HashTable has
            : been introduced, which will help in measuring these:
            : 1. Runtime for Building Hash Table and Probing the table.
            : 2. Memory consumed after building the Table.
            : This would help measuring the impact of changes to the HashTable's
            : data structure and algorithm.
nit. Nice addition! May be useful to include some results here.

May also include some numbers from TPCDS here.


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h@653
PS7, Line 653: 0
nit. May define and use constant for MATCHED here.


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/exec/hash-table.h@697
PS7, Line 697: 0
nit. same as above. May use constants for FILLED, MATCHED and DUPLICATED, 
instead of 0, 1 and 2 directly.


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h
File be/src/util/tagged-ptr.h:

http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@33
PS7, Line 33: /// bits. Tag bit 0 - corresponds to bit 63, bit 1 corresponds to 
62 and so on.
nit: may add: To get the address stored together with extra information in 
canonical form, one must set/reset all bits from 57 to 63.


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@71
PS7, Line 71: if (i > 6 || i < 0)
nit. UNLIKELY()?


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@78
PS7, Line 78: if (i > 6 || i < 0) {
same as above


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@91
PS7, Line 91:  //return (T*)((data_ & MASK_0_56_BITS) | ~((data_ & MASK_56_BIT) 
- 1));
nit. delete?


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@101
PS7, Line 101: bool operator==(const TaggedPtr<T> &a) noexcept { return data_ 
== a.data_; }
             :
             :   bool operator!=(const TaggedPtr<T> &a) noexcept { return data_ 
!= a.data_; }
nit. I found the implementation of these two operators counter intuitive as 
semantically, a comparison of two pointers means to compare the pointers, 
excluding any extra bits encoded within the pointers.

Maybe add two new methods
1. bool CompareTaggedPtrEqual()
2. bool compareTaggedPtrNotEqual()


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@115
PS7, Line 115: / data_ = (dat
nit. remove?


http://gerrit.cloudera.org:8080/#/c/17592/7/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/17592/7/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@47
PS7, Line 47: public final static double SIZE_OF_HASH = 4;
nit. I wonder if this value can be folded back into SIZE_OF_BUCKET and 
SIZE_OF_DUPLICATENODE. That is to define directly

SIZE_OF_BUCKET=12
SIZE_OF_DUPLICATENODE=20

as the separation of hash from bucket is an implementation optimization.



--
To view, visit http://gerrit.cloudera.org:8080/17592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Gerrit-Change-Number: 17592
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 15:55:17 +0000
Gerrit-HasComments: Yes

Reply via email to