Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#16). Change subject: IMPALA-4252: Min-max runtime filters for Kudu ...................................................................... IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test A testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-planner/queries/PlannerTest/views.test M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test A testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test A testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test A testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test A testdata/workloads/targeted-perf/queries/primitive_min_max_runtime_filter.test M tests/common/impala_test_suite.py M tests/query_test/test_kudu.py M tests/query_test/test_runtime_filters.py M tests/util/test_file_parser.py 87 files changed, 3,855 insertions(+), 1,649 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/16 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 16 Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>