Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@374
PS3, Line 374:   table_->PrepareBucketForInsert(bucket_idx_, hash);
> And hash-table suppose to be thread-safe for read access.

Yeah, it definitely needs to be thread-safe if muliontiple threads are reading 
from it, but mutations (like SetTuple()) do not need to be thread-safe. We can 
also document SetTuple() as invalidating other iterators, cause we don't depend 
on that either.

SetTuple() is only used by the hash aggregator - see 
be/src/exec/grouping-aggregator-ir.cc. The algorithm is basically this:

  it = ht->FindBucket(...);
  if (found in hash table) {
    // Merge into the existing intermediate tuple
    UpdateTuple(it, input_row)
  } else {
    // Try to construct a new intermediate tuple
    new_tuple = TryConstructTuple()
    if (tuple construction failed due to OOM) {
      SpillRow(input_row)
    } else {
      it.SetTuple(new_tuple)
      UpdateTuple(it, input_row)
    }
  }

Instead of FindBucket()/SetTuple() you could do Find()/Insert() but that would 
probe the hash table twice.



--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Mar 2020 21:42:28 +0000
Gerrit-HasComments: Yes

Reply via email to