Hello Aman Sinha, David Rorke, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20366

to look at the new patch set (#11).

Change subject: IMPALA-12357: Skip scheduling bloom filter from full-build scan
......................................................................

IMPALA-12357: Skip scheduling bloom filter from full-build scan

PK-FK join between a dimension table and a fact table is common
occurrences in a query. Such join often does not involve any predicate
filter in the dimension table. Thus, bloom filter value from this kind
of dimension table scan (PK) will most likely to have all values from
the fact table column (FK). It is ineffective to generate this filter
because it is unlikely to reject any rows, especially if the bloom
filter size is large and has high false positive probability (fpp)
estimate.

This patch skip scheduling bloom filter from join node that has this
characteristics:

1. Build side is full table scan.
2. The build scan does not have any predicate filter nor consume any
   runtime filter.
3. The join node is assumed to have PK-FK relationship.
4. The planned bloom filter has resulting fpp estimate higher than
   max_filter_error_rate_from_full_scan flag (default to 0.9).

The fourth criteria is an additional control to eliminate based on fpp
threshold because low fpp filter sometimes is still effective in
eliminating rows (i.e., rows with NULL value). Non-bloom filters remain
unchanged as they are relatively lighter to build and evaluate than
bloom filter.

Testing:
- Add testcase in testBloomFilterAssignment
- Pass core tests
- Ran TPC-DS 3TB with following query options:
  * RUNTIME_FILTER_MIN_SIZE=8192
  * RUNTIME_FILTER_MAX_SIZE=2097152
  * MAX_NUM_RUNTIME_FILTERS=50
  * RUNTIME_FILTER_WAIT_TIME_MS=10000
  22 out of 103 queries show reduction in number of runtime bloom
  filters without any notable performance regression.

Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e
---
M be/src/service/fe-support.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
8 files changed, 702 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/20366/11
--
To view, visit http://gerrit.cloudera.org:8080/20366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e
Gerrit-Change-Number: 20366
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to