Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17592 )

Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets 
efficiently.
......................................................................


Patch Set 10:

(43 comments)

http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@9
PS9, Line 9: HashTable implementation in Impala comprises of contiguous array
> nit. in Impala
Done


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@10
PS9, Line 10: ns either data or pointer to
            : linked list of du
> nit. "contains either data or a pointer to linked"
Done


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@15
PS9, Line 15:     bool matched;
            :     DuplicateNode* next;
            :     HtData htdata;
            :   };
            :
            :   struct Bucket {
            :     bool filled;
            :     bool matched;
            :     bool hasDuplicates;
            :     uint32_t hash;
            :     union {
            :       HtData htdata;
            :       DuplicateNode* duplicates;
            :     } bucketData;
            :   };
            :
> nit. The commit message is nice with many details. I wonder if it can be si
It would be easier for reader to look at struct to digest the new and old size 
and where size reduction is coming from.


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@38
PS9, Line 38: e mos
> nit. Intel
Done


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@53
PS9, Line 53: sures siz
> separately
Done


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@58
PS9, Line 58: class to store a pointer and 7-bit tag together in 64 bit integer.
> nit: a template class
Done


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@59
PS9, Line 59: ownership of the pointer and w
> nit. "tags together in 64-bit integers".
i have instead changed 'pointers' to singular to avoid impression of containing 
multiple pointers at once.


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@71
PS9, Line 71: med after building the Table.
> nit. use lower cases.
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc
File be/src/benchmarks/hash-table-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@47
PS9, Line 47: Hash Table Build:          Function  iters/ms   10%ile   50%ile   
90%ile     10%ile     50%ile     90%ile
> nit. Formatting. The space under "HashT able Build" is wasted. Suggest to b
This is common format for all benchmarks and this output is generated by common 
benchmark code for all other benchmarks. Moreover line is not wasted - it is to 
accommodate '(relative)' below.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@50
PS9, Line 50: 65536_100
> nit. Add a comment on the meaning of these two numbers.
Line 40 outlines it already.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@254
PS9, Line 254:   fo
> I think we should return bool here to indicate whether there are any troubl
I have converted it into CHECK now.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@261
PS9, Line 261:
> nit. This can be defined outside the for loop.
Status passed to 'insert' is not overwritten in all cases, hence a new one 
needs to be passed always.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@267
PS9, Line 267:  // Clear old
> nit. can we use name space htbenchmark here?
They are now nested into htbenchmark namespace itself. There is separate 
namespace within htbenchmark for build and probe benchmarks to organise the 
methods they exclusively need.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@279
PS9, Line 279:
> nit. We should check the return status from Build().
its not needed now as Build has CHECK.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-ir.cc
File be/src/exec/grouping-aggregator-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-ir.cc@113
PS9, Line 113: e wil
> nit: I wonder if instead of true/false we had an enum, then the code could
Sure, Done.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-ir.cc@240
PS9, Line 240:  This is called from ProcessBatchStreaming() so the rows are not 
aggregated.
             :   Hash
> nit: we could keep the old 'else if' and formatting so it is easier to spot
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-ir.cc@245
PS9, Line 245: se
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/grouping-aggregator-partition.cc@126
PS9, Line 126: ion =
> nit: can we use an enum here as well?
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@661
PS9, Line 661:
> nit: this definition is not needed
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@689
PS9, Line 689:
> nit: its
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@692
PS9, Line 692: tes whether
> seems like it's not true anymore
right. removed it.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@694
PS9, Line 694:  it is ef
> nit. this does not match the code at line 704.
right. changed the comment.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@698
PS9, Line 698:
> Tag bit 1?
yes. changed it.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@713
PS9, Line 713:      r
> I wonder if why this argument is needed. Is it true that all hash tables ha
No, hash tables don't have this tag. This tag is used to only set or update 
bucket data where client knows if data is tagged or not. it is required to 
reduce time complexity.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@723
PS9, Line 723: / If struct Bucket is
> nit: this definition is not needed.
Removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@732
PS9, Line 732:       bucket_data.htdata.tuple = bd.GetTuple<TAGGED>();
             :       return buck
> Seems like it's not true anymore
Right, it's removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.cc@579
PS9, Line 579:   Status hash_allocation_status =
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc
File be/src/util/benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@143
PS9, Line 143: if (fn != NULL) fn(benchmarks_[0].args);
> nit. May run Clang-format. {} is not needed if the entire IF statement is i
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@174
PS9, Line 174:  if (fn != NULL) fn(benchmarks_[i].args);
> nit. same as above.
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc
File be/src/util/tagged-ptr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@27
PS9, Line 27:
> nit const std::string&
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@42
PS9, Line 42:
> nit. const std::string&
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@59
PS9, Line 59:
> nit. I wonder if we use OWN=true version in BE. If so, you may test it well
We use OWN=false in backend too.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@92
PS9, Line 92: EXPECT_FALSE(ptr.IsTagBitSet<i>());
            :
> nit. May include setting tag bit 3, 4,5 and 6.
Have added above.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@97
PS9, Line 97: ch
> nit. Please use 0x60 here to indicate which bits are set better.
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@102
PS9, Line 102: tS
> nit same as above.
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@144
PS9, Line 144:
> nit remove the space.
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h
File be/src/util/tagged-ptr.h:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@47
PS9, Line 47: /// allocation/deallocation of the object to which pointer stored 
points to should be
> Instead of macros these could be template variables and template functions:
Makes sense. This simplifies things.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@54
PS9, Line 54:
> nit. Unset
Done


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@46
PS9, Line 46: /// Derived class would be needed to define what each tag bit 
would mean or when
            : /// allocation/deallocation of the object to which pointer stored 
points to should be
            : /// the responsibility of client and not 'TaggedPtr'( check 
HashTable::TaggedBucketData).
            :
            : template <class T, bool OWNS = true>
            : class TaggedPtr {
            :  public:
            :   TaggedPtr() = default;
            :
            :   template <class... Args>
            :   static TaggedPtr make_tagptr(Args&&... args) {
            :     T* ptr = new T(std::forward<Args>(args)...);
            :     return TaggedPtr(ptr);
            :   }
            :
> nit. May run ClangFormat on the entire macro to align '\' nicely.
Code Removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@99
PS9, Line 99:     return data_ & DATA_MASK<bit>;
> nit. tag bits 0-6.
Code Removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@100
PS9, Line 100: }
             :
             :   ALWAYS_INLINE bool IsNull() { return GetPtr() == 0; }
             :
> nit. This can be moved to where the macro is defined (e.g. line 46).
Code Removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@104
PS9, Line 104: ALWAYS_INLINE int GetTag() const { return data_ >> 57; }
             :
             :   ALWAYS_INLINE T* GetPtr() const { return (T*)(data_ & 
MASK_0_56_BITS); }
             :
             :   T& operator*() const noexcept { return *GetPtr(); }
             :
> nit. remove?
Code Removed.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@137
PS9, Line 137:
> This can be made a constant to avoid compute ~ every time SetPtr() is calle
Done



--
To view, visit http://gerrit.cloudera.org:8080/17592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Gerrit-Change-Number: 17592
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Aug 2021 14:09:18 +0000
Gerrit-HasComments: Yes

Reply via email to