Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20366 )
Change subject: IMPALA-12357: Skip scheduling bloom filter from full-build scan ...................................................................... IMPALA-12357: Skip scheduling bloom filter from full-build scan PK-FK join between a dimension table and a fact table is common occurrences in a query. Such join often does not involve any predicate filter in the dimension table. Thus, bloom filter value from this kind of dimension table scan (PK) will most likely to have all values from the fact table column (FK). It is ineffective to generate this filter because it is unlikely to reject any rows, especially if the bloom filter size is large and has high false positive probability (fpp) estimate. This patch skip scheduling bloom filter from join node that has this characteristics: 1. Build side is full table scan (has hard estimates). 2. The build scan does not have any predicate filter nor consume any runtime filter. 3. The join node is assumed to have PK-FK relationship. 4. The planned bloom filter has resulting fpp estimate higher than max_filter_error_rate_from_full_scan flag (default to 0.9). The fourth criteria is an additional control to eliminate based on fpp threshold because low fpp filter sometimes is still effective in eliminating rows (i.e., rows with NULL value). Non-bloom filters remain unchanged as they are relatively lighter to build and evaluate than bloom filter. Testing: - Add testcase in testBloomFilterAssignment - Pass core tests - Ran TPC-DS 3TB with following query options: * RUNTIME_FILTER_MIN_SIZE=8192 * RUNTIME_FILTER_MAX_SIZE=2097152 * MAX_NUM_RUNTIME_FILTERS=50 * RUNTIME_FILTER_WAIT_TIME_MS=10000 19 out of 103 queries show reduction in number of runtime bloom filters without any notable performance regression. Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e Reviewed-on: http://gerrit.cloudera.org:8080/20366 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/service/fe-support.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/common/TreeNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test 12 files changed, 1,433 insertions(+), 37 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e Gerrit-Change-Number: 20366 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>