Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20366
Change subject: IMPALA-12357: Skip scheduling bloom filter from full-build scan ...................................................................... IMPALA-12357: Skip scheduling bloom filter from full-build scan PK-FK join between a dimension table and a fact table is common occurrences in a query. Such join often does not involve any predicate filter in the dimension table. Thus, bloom filter value from this kind of dimension table scan (PK) will most likely to have all values from the fact table column (FK). It is ineffective to generate this filter because it is unlikely to reject any rows, especially if the bloom filter size is large and has high false positive probability (fpp) estimate. This patch skip scheduling bloom filter from join node that has this characteristics: 1. Build side is full table scan 2. The build scan does not have any predicate filter nor consume any runtime filter 3. The planned bloom filter has fpp estimate higher than 0.9 PK-FK relationship is rarely defined and enforced at table schema definition. Therefore, the third criteria replace the PK-FK relationship characteristic checks. It is also a narrower criteria that only target the large bloom filters, thus reducing the bloom filter aggregation overhead in coordinator. Non-bloom filters remain unchanged as they are relatively lighter to build and evaluate than bloom filter. Testing: - Add testcase in testBloomFilterAssignment - PENDING performance evaluation. Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e --- M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/common/TreeNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test 5 files changed, 275 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/20366/1 -- To view, visit http://gerrit.cloudera.org:8080/20366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I494533bc06da84e606cbd1ae1619083333089a5e Gerrit-Change-Number: 20366 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>