Joe McDonnell created IMPALA-14253:
--------------------------------------
Summary: HashTable's travel_length_ statistic is incorrect
Key: IMPALA-14253
URL: https://issues.apache.org/jira/browse/IMPALA-14253
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
The profile has some statistics about the hash table:
{noformat}
Hash Table [120 instances]:
- HashBuckets: 33.55M (33554432)
- HashCollisions: 45.42K (45423)
- Probes: 38.56M (38562545)
- Resizes: 176 (176)
- Travel: 0 (0){noformat}
The "Travel" statistic comes from HashTable's travel_length_ counter. This is
not being counted correctly. If HashCollisions are non-zero, the travel should
be non-zero. The problem is that the code is not updating the travel_length_
when it returns early (which is almost always):
{noformat}
int64_t step = 0;
do {
Bucket* bucket = &buckets[bucket_idx];
if (LIKELY(!bucket->IsFilled())) return bucket_idx; <--- Doesn't update
travel_length_
if (hash == hash_array[bucket_idx]) {
if (COMPARE_ROW
&& ht_ctx->Equals<INCLUSIVE_EQUALITY>(
GetRow<TYPE>(bucket, ht_ctx->scratch_row_, bd))) {
*found = true;
return bucket_idx; <--------- Doesn't update travel_length_
}
// Row equality failed, or not performed. This is a hash collision.
Continue
// searching.
++ht_ctx->num_hash_collisions_;
}
// Move to the next bucket.
++step;
... logic to pick next bucket ...
} while (LIKELY(step < num_buckets));
ht_ctx->travel_length_ += step;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)