Bankim Bhavsar created KUDU-3286:
------------------------------------

             Summary: Add special handling for empty strings for Bloom filter 
predicate push down
                 Key: KUDU-3286
                 URL: https://issues.apache.org/jira/browse/KUDU-3286
             Project: Kudu
          Issue Type: Improvement
    Affects Versions: 1.13.0
            Reporter: Bankim Bhavsar
            Assignee: Bankim Bhavsar


Fast hash used with Bloom filter predicate pushdown has special handling for 
nullptr.

[https://github.com/apache/kudu/blob/master/src/kudu/util/hash_util.h#L95]

However there isn't any special handling for empty objects/strings. Fast hash 
for an empty string with seed=0 generates a hash value of 0. This doesn't set 
any bits in Bloom filter and as a result empty strings are reported as not 
present.

Impala uses the direct bloom filter approach and includes special handling for 
empty strings.
[https://github.com/apache/impala/blob/master/be/src/runtime/raw-value.inline.h#L352]



This leads to discrepancy between Impala and Kudu and returns incorrect join 
results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to