Bankim Bhavsar created KUDU-3286: ------------------------------------ Summary: Add special handling for empty strings for Bloom filter predicate push down Key: KUDU-3286 URL: https://issues.apache.org/jira/browse/KUDU-3286 Project: Kudu Issue Type: Improvement Affects Versions: 1.13.0 Reporter: Bankim Bhavsar Assignee: Bankim Bhavsar
Fast hash used with Bloom filter predicate pushdown has special handling for nullptr. [https://github.com/apache/kudu/blob/master/src/kudu/util/hash_util.h#L95] However there isn't any special handling for empty objects/strings. Fast hash for an empty string with seed=0 generates a hash value of 0. This doesn't set any bits in Bloom filter and as a result empty strings are reported as not present. Impala uses the direct bloom filter approach and includes special handling for empty strings. [https://github.com/apache/impala/blob/master/be/src/runtime/raw-value.inline.h#L352] This leads to discrepancy between Impala and Kudu and returns incorrect join results. -- This message was sent by Atlassian Jira (v8.3.4#803005)