Joe McDonnell created IMPALA-14253:
--------------------------------------

             Summary: HashTable's travel_length_ statistic is incorrect
                 Key: IMPALA-14253
                 URL: https://issues.apache.org/jira/browse/IMPALA-14253
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


The profile has some statistics about the hash table:
{noformat}
          Hash Table [120 instances]:
             - HashBuckets: 33.55M (33554432)
             - HashCollisions: 45.42K (45423)
             - Probes: 38.56M (38562545)
             - Resizes: 176 (176)
             - Travel: 0 (0){noformat}
The "Travel" statistic comes from HashTable's travel_length_ counter. This is 
not being counted correctly. If HashCollisions are non-zero, the travel should 
be non-zero. The problem is that the code is not updating the travel_length_ 
when it returns early (which is almost always):
{noformat}
  int64_t step = 0;
  do {
    Bucket* bucket = &buckets[bucket_idx];
    if (LIKELY(!bucket->IsFilled())) return bucket_idx; <--- Doesn't update 
travel_length_
    if (hash == hash_array[bucket_idx]) {
      if (COMPARE_ROW
          && ht_ctx->Equals<INCLUSIVE_EQUALITY>(
                 GetRow<TYPE>(bucket, ht_ctx->scratch_row_, bd))) {
        *found = true;
        return bucket_idx; <--------- Doesn't update travel_length_
      }
      // Row equality failed, or not performed. This is a hash collision. 
Continue
      // searching.
      ++ht_ctx->num_hash_collisions_;
    }
    // Move to the next bucket.
    ++step;
    ... logic to pick next bucket ...
  } while (LIKELY(step < num_buckets));

  ht_ctx->travel_length_ += step;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to