Bankim Bhavsar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15913


Change subject: [perf] Check range predicate first while evaluating Bloom 
filter predicate
......................................................................

[perf] Check range predicate first while evaluating Bloom filter predicate

Range predicates can be specified along with Bloom filter predicate
for the same column. It's cheaper to check against range
predicate and exit early if the column value is out of bounds
compared to computing hash and then looking up the value in Bloom filter.

This case is common when Impala pushes down Bloom filter
predicate as it'll likely be accompained by min-max filter (i.e. range
predicate) on the same column.

Tests:
Added a test case that scans against 100M column values.
Across iterations observed an improvement of 20-30% when the range predicate
check prevents hash computation and Bloom filter lookup.
Don't see any noticeable regression for the case where values are within
range bounds.

Without perf change:
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.953s      user 0.001s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.767s user 0.001s     sys 0.000s

Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.899s      user 0.000s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.775s user 0.000s     sys 0.001s

Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.983s      user 0.000s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.832s user 0.001s     sys 0.000s

With perf change:
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.725s      user 0.001s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.847s user 0.000s     sys 0.000s

Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.664s      user 0.000s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.794s user 0.001s     sys 0.000s

Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when no rows 
expected: real 0.706s      user 0.001s     sys 0.000s
Time spent TestKuduBloomFilterPredicateBenchmark: Counting rows when range 
predicate doesn't prune: real 0.774s user 0.000s     sys 0.000s

Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3
---
M src/kudu/client/predicate-test.cc
M src/kudu/common/column_predicate.h
2 files changed, 69 insertions(+), 42 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/15913/1
--
To view, visit http://gerrit.cloudera.org:8080/15913
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3
Gerrit-Change-Number: 15913
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ban...@cloudera.com>

Reply via email to