[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8506 ) Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables .. Patch Set 1: GVO failure was unrelated. -- To view, visit http://gerrit.cloudera.org:8080/8506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Gerrit-Change-Number: 8506 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 09 Nov 2017 20:40:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 14: (7 comments) http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/exec/kudu-scanner.cc@200 PS13, Line 200: if (!filter->GetCastIntMinMax(col_type, &int_min, &int_max)) { > Unfortunate that that every Kudu client has to do this. This is something that's been discussed before (see KUDU-931), I'm not sure why they decided not to do it. http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@139 PS13, Line 139: case TYPE_BIGINT: > using std::numeric_limits; at the top if this file Done http://gerrit.cloudera.org:8080/#/c/7793/13/be/src/util/min-max-filter.cc@165 PS13, Line 165: > Brief comment especially about the return value would be good. Done http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@604 PS13, Line 604: // must be a SlotRef pointing to a column. We can allow implicit integer casts > typo: We can allow implicit integer casts Done http://gerrit.cloudera.org:8080/#/c/7793/13/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@605 PS13, Line 605: // by casting the min/max values before sending them to Kudu. > min/max values Done http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@98 PS13, Line 98: where a.tinyint_col = b.int_col and b.int_col in (0, 1) > Let's make the min/max filter selective, e.g. by adding where b.int_col in Done http://gerrit.cloudera.org:8080/#/c/7793/13/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@103 PS13, Line 103: # The min/max values in the filter are both above the range of the target col so all rows > Let's also add a non-selective case where the min/max values fall outside t Done -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 14 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 09 Nov 2017 18:53:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#14). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functi
[Impala-ASF-CR] IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8506 Change subject: IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables .. IMPALA-6173: Fix SHOW CREATE TABLE for unpartitioned Kudu tables IMPALA-5546 added the ability to create unpartitioned Kudu tables, but when SHOW CREATE TABLE is run on it still prints 'PARTITION BY' just without a partition clause. This patch removes the 'PARTITION BY' from the output. Testing: - Added test that runs SHOW CREATE on an unpartitioned Kudu table. Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 --- M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M tests/query_test/test_kudu.py 2 files changed, 13 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/8506/1 -- To view, visit http://gerrit.cloudera.org:8080/8506 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icc327266cfb8b5c05efec97348528cea6904bb20 Gerrit-Change-Number: 8506 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#13). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functi
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105 PS1, Line 105: consumed > nit: "consumed" to be consistent with the memtracker terminology. Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106 PS1, Line 106: client_tracked > Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and i Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52 PS1, Line 52: > Was this flag documented? Just wondering if we should consider what happens No, its not documented, and of course it specifically says that it may be changed/removed. Certainly happy to consider other options if you think removing the flag is too disruptive, eg. the error buffer size could be calculated as the difference between "kudu_sink_mem_required" and "kudu_mutation_buffer_size", that just seemed a little complicated. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39 PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, DEFAULT_KUDU_MUTATION_BUFFER_SIZE, > Just throwing out ideas here, but did we think about pros/cons of making th I think that's a reasonable idea, though like you say these probably don't need to be modified very often. At the least, I think it makes sense to get this in and file a JIRA for followup. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124 PS1, Line 124: int64_t required_mem = FLAGS_kudu_mutation_buffer_size + error_buffer_size; > Is this equivalent to the following? Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132 PS1, Line 132: state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), &client_)); > nit: long line. Done http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66 PS1, Line 66: @CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024") > It might be faster to make this a regular query test but insert more data s It takes a very large number of errors to hit the default limit (>10m "Key already present" errors), so I don't think it ends up being any faster. http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74 PS1, Line 74: presen > present Done -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Nov 2017 19:40:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8464 to look at the new patch set (#2). Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 41 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/2 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission control resources"
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8499 ) Change subject: IMPALA-6171: Revert "IMPALA-1575: part 2: yield admission control resources" .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8499 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3eec4b5a6ff350933ffda0bb80949c5960ecdf25 Gerrit-Change-Number: 8499 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 08 Nov 2017 19:04:39 + Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to 1520b39
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8492 Change subject: Bump Kudu version to 1520b39 .. Bump Kudu version to 1520b39 Change-Id: Ib3d5d88bd4d3347447a64fac75ca3e0427b11ec6 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/8492/1 -- To view, visit http://gerrit.cloudera.org:8080/8492 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib3d5d88bd4d3347447a64fac75ca3e0427b11ec6 Gerrit-Change-Number: 8492 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[native-toolchain-CR] Bump Kudu version to 1520b39
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8491 ) Change subject: Bump Kudu version to 1520b39 .. Bump Kudu version to 1520b39 Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/8491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 Gerrit-Change-Number: 8491 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] Bump Kudu version to 1520b39
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8491 ) Change subject: Bump Kudu version to 1520b39 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 Gerrit-Change-Number: 8491 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Nov 2017 18:38:30 + Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 1520b39
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8491 ) Change subject: Bump Kudu version to 1520b39 .. Patch Set 1: http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/482/ -- To view, visit http://gerrit.cloudera.org:8080/8491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 Gerrit-Change-Number: 8491 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 07 Nov 2017 17:32:26 + Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 1520b39
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8491 Change subject: Bump Kudu version to 1520b39 .. Bump Kudu version to 1520b39 Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/91/8491/1 -- To view, visit http://gerrit.cloudera.org:8080/8491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib5e0dd9db01972d518e4944cf2a476f1d9e7cf08 Gerrit-Change-Number: 8491 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8490 ) Change subject: IMPALA-2248: Make idle_session_timeout a query option .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@204 PS3, Line 204: if (iequals(key, "idle_session_timeout")) { Its unfortunate this is special cased here. Could you add a comment explaining why that's necessary. http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/client-request-state.cc@213 PS3, Line 213: key, value, : &session_->set_query_options, single line http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8490/3/be/src/service/impala-server.cc@189 PS3, Line 189: DEFINE_int32(idle_session_timeout, 0, "The time, in seconds, that a session may be idle" Note how this interacts with the query option. http://gerrit.cloudera.org:8080/#/c/8490/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/8490/3/common/thrift/ImpalaService.thrift@292 PS3, Line 292: // The time, in seconds, that a session may be idle for before it is closed (and all Note how this interacts with the flag. -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Nov 2017 17:03:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: > Thanks for explaining. https://docs.google.com/document/d/1G-SPZelateebNxzVb67urEVtjc5Itw-B_jjfS85bSCE/edit?usp=sharing http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/exec/filter-context.h@118 PS11, Line 118: Materialize filter values. > what does that mean? Done http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/runtime/runtime-filter-bank.h@78 PS11, Line 78: /// may both be NULL, representing a filter that allows all rows to pass. > is it the case that at most one of bloom_filter and min_max_filter should b Done http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/11/be/src/util/min-max-filter.h@62 PS11, Line 62: Materialize filter values > what does that mean? Done http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103 PS9, Line 103: 12: optional string kudu_col_name > case sensitive? Done http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359 PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; } > getExprCmpOp() Done http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602 PS11, Line 602: if (!(targetExpr instanceof SlotRef) > I think only explicit casts are problematic. Implicit casts should be ok, o Right, we should be able to support integer implicit casts. The complication is that if, say, the calculated max is outside of the range for the type of the targeted column, we can't just pass that value into Kudu as it will complain. In that case, we would need to convert the max we send to be the max for the type of the targeted column. I've added some code to the BE to do this conversion. http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144 PS11, Line 144: on a.month = cast(b.month + 1 as int); > Was this your change? Why the change? This was necessary if we didn't support implicit casts on the target. Removed now. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 11 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 07 Nov 2017 15:06:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#12). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functi
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8464 Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 39 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 03 Nov 2017 19:07:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 11: (19 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13 PS9, Line 13: In RuntimeFilterGenerator in the planner, each hash join node > ... each hash join node generates a bloom and min-max filter for each equi- Done http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: > Not specific to the code changes, and I don't expect a response here (proba RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the time a scan node will wait for any filter of either type to arrive. Similarly, RUNTIME_FILTER_MODE works the same for min-max as its worked for bloom. The size related params - RUNTIME_BLOOM_FILTER_SIZE, RUNTIME_FILTER_MIN/MAX_SIZE only apply to bloom filters as the mem used by min-max is small and bounded. I've updated some comments to make the above clearer. One tricky case is DISABLE_ROW_RUNTIME_FILTERING. Kudu applies the filters on both a per-partition and per-row basis. At the moment, there's no way to change this - the filters are applied through the same interface as eg. predicates that are pushed down, so Kudu assumes that they have to be applied for correctness, not just for performance. So, I could see the argument either way - we could disable Kudu filters if DISABLE_ROW_RUNTIME_FILTERING is true, though it would also disable the partition filtering, or we could just not consider DISABLE_ROW_RUNTIME_FILTERING for Kudu filters. The remaining filter related option is MAX_NUM_RUNTIME_FILTERS, which I've addressed in another comment. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: > Contrived extreme queries are good data points, but how about running the T Those results are posted in the review comments. I can include them here as well, I just felt it made the commit message excessively large. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44 PS9, Line 44: timings are averages of 3 runs. > does not eliminate Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h@164 PS8, Line 164: tvalue.ti > I'm confused by this variable name. Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@27 PS8, Line 27: > Can you maybe briefly describe what is tested for each filter type? There s Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@252 PS8, Line 252: MinMaxFilter::Create(tFilter, string_type, &obj_pool, &mem_pool); > Would TestEnv work? That is the semi-standard way to create an ExecEnv for Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@353 PS8, Line 353: t3.ToTColumnValue(&tFilter2.max); > Remove? Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc@246 PS8, Line 246: null > nullptr? Done http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query > I think I understand why you did this, but it seems confusing from a user's There's basically two reasons for this: - We don't want to regress queries by eliminating bloom filters that used to be generated. - The purpose of this param in the first place was to prevent the coordinator from being overwhelmed. Min-max filters are less heavy-weight than bloom filters so they don't put as much stress on the coordinator anyways. Given that, I think that it would be fine to maintain this behavior even once HDFS can do min-max and Kudu can do bloom. Another concern though is what to do if we add in-list filters, which are probably similarly heavy-weight to bloom filters. One option would be to deprecate this and have separate max_num_bloom and max_num_in_list params, but that may be confusing and requires user action to keep things working as expected anyways. Another option at that point would be to keep this param and make it "Maximum number of expensive filters (bloom or in-list)", and then there's just once we have to worry about regressing queries. Or maybe we should just apply this to min-max filters as well, since its the least confusing option in the long run, possibly bumping
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#11). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planne
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#10). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#9). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
[Impala-ASF-CR] IMPALA-6127: Fix timeout in TestRuntimeFilter.test wait time
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8427 Change subject: IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time .. IMPALA-6127: Fix timeout in TestRuntimeFilter.test_wait_time test_wait_time has been flaky recently on ASAN due to hitting a timeout. The fix is to increase the timeout for ASAN builds. Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de --- M tests/query_test/test_runtime_filters.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/8427/1 -- To view, visit http://gerrit.cloudera.org:8080/8427 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iee005bee8e0a535ce59d2e23e56be6004f2eb9de Gerrit-Change-Number: 8427 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 8: TPCDS results: +-++++-++++-+---+ | Workload| Query | File Format| Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-++++-++++-+---+ | TPCDS(_300) | TPCDS-Q73 | kudu / none / none | 12.01 | 7.99| R +50.42% | 0.31%| 1.04%| 1 | 4 | | TPCDS(_300) | TPCDS-Q68 | kudu / none / none | 17.02 | 13.00 | R +30.97% | 0.82%| 0.26%| 1 | 4 | | TPCDS(_300) | TPCDS-Q79 | kudu / none / none | 16.61 | 12.95 | R +28.31% | 0.56%| 0.54%| 1 | 4 | | TPCDS(_300) | TPCDS-Q1 | kudu / none / none | 2.36 | 1.99| +18.19% | 2.86%| 1.38%| 1 | 4 | | TPCDS(_300) | TPCDS-Q53 | kudu / none / none | 2.18 | 2.08| +4.96% | 0.32%| 1.98%| 1 | 4 | | TPCDS(_300) | TPCDS-Q8 | kudu / none / none | 4.03 | 3.93| +2.38% | 0.43%| 0.38%| 1 | 4 | | TPCDS(_300) | TPCDS-Q98 | kudu / none / none | 7.80 | 7.67| +1.72% | 0.66%| 0.81%| 1 | 4 | | TPCDS(_300) | TPCDS-Q7 | kudu / none / none | 3.75 | 3.71| +1.15% | 1.14%| 1.17%| 1 | 4 | | TPCDS(_300) | TPCDS-Q4 | kudu / none / none | 29.59 | 29.27 | +1.11% | 4.13%| 3.67%| 1 | 4 | | TPCDS(_300) | TPCDS-Q65 | kudu / none / none | 13.97 | 14.01 | -0.28% | 2.70%| 1.43%| 1 | 4 | | TPCDS(_300) | TPCDS-Q28 | kudu / none / none | 4.06 | 4.08| -0.35% | 0.84%| 0.48%| 1 | 4 | | TPCDS(_300) | TPCDS-Q89 | kudu / none / none | 2.46 | 2.48| -0.62% | 2.60%| 1.81%| 1 | 4 | | TPCDS(_300) | TPCDS-Q55 | kudu / none / none | 2.44 | 2.46| -0.83% | 1.08%| 5.41%| 1 | 4 | | TPCDS(_300) | TPCDS-Q42 | kudu / none / none | 2.42 | 2.47| -1.89% | 0.26%| 4.50%| 1 | 4 | | TPCDS(_300) | TPCDS-Q23 | kudu / none / none | 90.96 | 95.30 | -4.55% | 1.06%| 2.21%| 1 | 4 | | TPCDS(_300) | TPCDS-Q43 | kudu / none / none | 5.05 | 5.34| -5.49% | 0.89%| 4.17%| 1 | 4 | | TPCDS(_300) | TPCDS-Q3 | kudu / none / none | 4.11 | 4.36| -5.84% | 2.48%| 1.15%| 1 | 4 | | TPCDS(_300) | TPCDS-Q61 | kudu / none / none | 8.56 | 9.93| -13.83% | 0.78%| 1.84%| 1 | 4 | | TPCDS(_300) | TPCDS-Q2 | kudu / none / none | 12.19 | 15.68 | I -22.28% | * 27.57% * | 1.11%| 1 | 4 | | TPCDS(_300) | TPCDS-Q47 | kudu / none / none | 16.82 | 21.82 | I -22.91% | 1.16%| 1.25%| 1 | 4 | | TPCDS(_300) | TPCDS-Q19 | kudu / none / none | 4.78 | 6.32| I -24.29% | 1.14%| 0.97%| 1 | 4 | | TPCDS(_300) | TPCDS-Q46 | kudu / none / none | 6.58 | 8.77| I -24.88% | 0.86%| 1.01%| 1 | 4 | | TPCDS(_300) | TPCDS-Q88 | kudu / none / none | 14.89 | 19.84 | I -24.95% | 0.17%| 3.79%| 1 | 4 | | TPCDS(_300) | TPCDS-Q59 | kudu / none / none | 24.05 | 35.61 | I -32.45% | * 11.97% * | * 20.61% * | 1 | 4 | | TPCDS(_300) | TPCDS-Q34 | kudu / none / none | 3.94 | 6.82| I -42.24% | 2.37%| 4.12%| 1 | 4 | | TPCDS(_300) | TPCDS-Q27 | kudu / none / none | 3.97 | 7.91| I -49.85% |
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 8: > > Patch Set 7: > > > > > > Patch Set 7: > > > > > > > > Perf results: > > > > ... > > > > > > I'm surprised that only a few queries saw significant > speedups. Is > > > this in line with what you saw with Parquet runtime filters on > > > TPC-H? Or are we losing a lot by using min/max instead of > bloom or > > > in-list style filters? > > > > Not sure about bloom filters perf, though I can run those numbers > for comparison. > > I haven't looked at this patch, but had a question about the > design: > > Are we still pushing blooms across a join to prevent shuffling of > data? Or are we now pushing _only_ min/max? > > It seems there is value in pushing both: the bloom for evaluation > on the other side of the join to prevent shuffling, and the min/max > to push all the way to the scanner to reduce I/O. > > Not sure if the patch is already doing this. Impala only evaluates runtime filters in the scan. Even prior to this patch, the Kudu scanner was not evaluating bloom filters (and hash joins with Kudu scan targets don't build bloom filters). It certainly could be useful to evaluate bloom filters on the Impala side of a Kudu scan, but I believe our thinking was that it wasn't worth it to implement that - better to just wait until bloom filters can be pushed all the way down into Kudu. If bloom filters in Kudu are a long way off, though, we should maybe reevaluate that. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 26 Oct 2017 20:54:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 8: (10 comments) > I noticed that - BloomFilterBytes is always 0 in the query > profiles. > Can you please veirfy? Yes. This line is always printed to the profile, even if there are no bloom filters being built. We don't expose the mem used by min-max filters in the profile as its negligible. http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG@26 PS7, Line 26: > Might be worth mentioning why the runtime filters were renumbered in all th Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h@106 PS7, Line 106: Status CloneFrom(const FilterContext& from, ObjectPool* pool, RuntimeState* state, > Long lines Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@140 PS7, Line 140: virtual void* GetMax() override { return &max_; } > Nit: not sure why this is using underscores while AlwaysTrue() is using cam Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@176 PS7, Line 176: StringValue min_; > Can you briefly mention what it means when these are empty - that the value Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291 PS6, Line 291: out->__isset.min = true; > Oh ok, thanks for clarifying. It looked superficially like a last line copy Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@137 PS7, Line 137: DCHECK(thrift.__isset.max); > Do we have end-to-end tests for these code paths? I think we could generall I've now added e2e and unit tests for all of the truncation scenarios. http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143 PS7, Line 143: CopyToBuffer(&max_buffer_, &max_, max_.len); > This has a fairly large number of edge cases - it would be good to unit tes Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143 PS7, Line 143: uff > static_cast instead of int()? Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@150 PS7, Line 150: > I would have expected that we would modify the trailing values that overflo Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@207 PS7, Line 207: out->min.__set_string_val(in.min.string_val); > It seems tricky to test these various error paths in end-to-end tests. Coul As you say, its difficult to test the oom scenario in an e2e test, since the max amount of mem being used here is so small, but its covered with a unit test now. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 26 Oct 2017 20:52:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#8). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
[Impala-ASF-CR] IMPALA-6099: Fix crash in CheckForAlwaysFalse()
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8369 ) Change subject: IMPALA-6099: Fix crash in CheckForAlwaysFalse() .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8369/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8369/1//COMMIT_MSG@14 PS1, Line 14: brief note about testing, eg. "ran existing test ... which previously repro-ed this" -- To view, visit http://gerrit.cloudera.org:8080/8369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3eda43845b78516c1e76e07e0d2dd9d862168e1d Gerrit-Change-Number: 8369 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 24 Oct 2017 17:22:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 7: > > Patch Set 7: > > > > Perf results: > > ... > > I'm surprised that only a few queries saw significant speedups. Is > this in line with what you saw with Parquet runtime filters on > TPC-H? Or are we losing a lot by using min/max instead of bloom or > in-list style filters? Not sure about bloom filters perf, though I can run those numbers for comparison. One issue here is that tpch_100 isn't actually a very large dataset for a 9 node cluster. I've been having some trouble getting larger datasets loaded - making progress, but I'll reach out to some Kudu people if I can't get that working. Another issue is that I haven't tried playing around with our RUNTIME_FILTER_WAIT_TIME_MS tuning param yet (maybe Mostafa can speak to the advice we usually give customers here?) There were also some improvements to the handling of strings in the latest iteration of the review that aren't reflected here. I'll have updated numbers probably tomorrow. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 23 Oct 2017 21:35:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8358 ) Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test: http://gerrit.cloudera.org:8080/#/c/8358/1/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test@350 PS1, Line 350: SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS; > Do you think it makes sense to have different timeouts for each different t Its not really necessary. There's one case (14) that seems to legitimately need more time than the others, but I don't see why its a problem to just have all the tests run with enough time for the slowest. (14 is currently set to 100s, but testing locally its only about 30% slower than the tests that are set to 30s, so I sort of split the difference with the new version of the patch). -- To view, visit http://gerrit.cloudera.org:8080/8358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Gerrit-Change-Number: 8358 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 23 Oct 2017 20:48:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8358 to look at the new patch set (#2). Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN .. IMPALA-6004: Fix test_row_filters failure on ASAN 'Test case 16' in test_row_filters has been failing occasionaly on ASAN as the runtime filters are not generated within the specified RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase RUNTIME_FILTER_WAIT_TIME_MS. This patch updates all of the tests in test_row_filters to use the same timeout, which is set to a higher value for ASAN builds. Change-Id: Ia098735594b36a72f02bf7edd051171689618051 --- M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M tests/query_test/test_runtime_filters.py 2 files changed, 28 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8358/2 -- To view, visit http://gerrit.cloudera.org:8080/8358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Gerrit-Change-Number: 8358 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-6004: Fix test row filters failure on ASAN
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8358 Change subject: IMPALA-6004: Fix test_row_filters failure on ASAN .. IMPALA-6004: Fix test_row_filters failure on ASAN "Test case 16" in test_row_filters has been failing occasionaly on ASAN as the runtime filters are not generated within the specified RUNTIME_FILTER_WAIT_TIME_MS. The fix is to increase RUNTIME_FILTER_WAIT_TIME_MS. Change-Id: Ia098735594b36a72f02bf7edd051171689618051 --- M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8358/1 -- To view, visit http://gerrit.cloudera.org:8080/8358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia098735594b36a72f02bf7edd051171689618051 Gerrit-Change-Number: 8358 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 7: Perf results: ++--+++-++---++-+---+ | Workload | Query| File Format| Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | ++--+++-++---++-+---+ | TPCH(_100) | TPCH-Q20 | kudu / none / none | 7.04 | 6.10| +15.45% | 3.51% | 3.28%| 1 | 4 | | TPCH(_100) | TPCH-Q8 | kudu / none / none | 8.09 | 7.38| +9.67% | 1.11% | 2.57%| 1 | 4 | | TPCH(_100) | TPCH-Q17 | kudu / none / none | 19.13 | 17.50 | +9.33% | 3.91% | 4.34%| 1 | 4 | | TPCH(_100) | TPCH-Q22 | kudu / none / none | 2.71 | 2.56| +6.00% | 0.90% | 4.11%| 1 | 4 | | TPCH(_100) | TPCH-Q16 | kudu / none / none | 4.23 | 4.01| +5.45% | 4.13% | 0.53%| 1 | 4 | | TPCH(_100) | TPCH-Q1 | kudu / none / none | 10.87 | 10.55 | +3.05% | 1.06% | 4.61%| 1 | 4 | | TPCH(_100) | TPCH-Q18 | kudu / none / none | 24.93 | 24.64 | +1.18% | 3.44% | 1.29%| 1 | 4 | | TPCH(_100) | TPCH-Q6 | kudu / none / none | 2.27 | 2.26| +0.77% | 0.87% | 1.35%| 1 | 4 | | TPCH(_100) | TPCH-Q7 | kudu / none / none | 18.55 | 18.43 | +0.66% | 3.49% | 1.40%| 1 | 4 | | TPCH(_100) | TPCH-Q3 | kudu / none / none | 10.48 | 10.43 | +0.47% | 0.98% | 1.43%| 1 | 4 | | TPCH(_100) | TPCH-Q4 | kudu / none / none | 15.94 | 15.90 | +0.27% | 2.07% | 1.26%| 1 | 4 | | TPCH(_100) | TPCH-Q15 | kudu / none / none | 6.01 | 6.00| +0.13% | 2.57% | 1.76%| 1 | 4 | | TPCH(_100) | TPCH-Q19 | kudu / none / none | 8.02 | 8.07| -0.63% | 3.79% | 2.33%| 1 | 4 | | TPCH(_100) | TPCH-Q2 | kudu / none / none | 4.07 | 4.10| -0.66% | 5.74% | 1.90%| 1 | 4 | | TPCH(_100) | TPCH-Q9 | kudu / none / none | 18.00 | 18.36 | -1.96% | 5.78% | 1.17%| 1 | 4 | | TPCH(_100) | TPCH-Q13 | kudu / none / none | 11.44 | 11.67 | -1.99% | 3.58% | 1.12%| 1 | 4 | | TPCH(_100) | TPCH-Q12 | kudu / none / none | 6.40 | 6.56| -2.37% | 2.20% | 2.28%| 1 | 4 | | TPCH(_100) | TPCH-Q10 | kudu / none / none | 5.42 | 5.60| -3.17% | 1.90% | 2.60%| 1 | 4 | | TPCH(_100) | TPCH-Q5 | kudu / none / none | 10.53 | 11.07 | -4.87% | 5.59% | 1.44%| 1 | 4 | | TPCH(_100) | TPCH-Q21 | kudu / none / none | 51.31 | 54.29 | -5.48% | 2.18% | 2.39%| 1 | 4 | | TPCH(_100) | TPCH-Q14 | kudu / none / none | 4.76 | 7.27| I -34.49% | 0.87% | 0.54%| 1 | 4 | | TPCH(_100) | TPCH-Q11 | kudu / none / none | 1.27 | 2.56| I -50.18% | 2.62% | 4.17%| 1 | 4 | ++--+++-++---++-+---+ For the queries that regressed somewhat - Q20,8,17 - the filters simply aren't very selective. We might consider a path to disable a filter that we see in not selective the way we do with bloom filters once Kudu exposes stats are filter effectiveness. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Oct 2017 21:20:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8146 ) Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro .. Patch Set 10: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 10 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 20 Oct 2017 17:13:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8146 ) Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro .. Patch Set 9: (4 comments) Looks good http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/exec/hdfs-scanner.cc@543 PS9, Line 543: *init_tuple_fn = : codegen->GetFunction(IRFunction::HDFS_SCANNER_INIT_TUPLE, true); single line? http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/descriptors.cc@86 PS9, Line 86: ConstantAggregateZero* zeroes = : ConstantAggregateZero::get(null_indicator_offset_type); single line? http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h@182 PS9, Line 182: Status* status Its unusual to have s status as an out parameter instead of returning it. Could you document why you did this? http://gerrit.cloudera.org:8080/#/c/8146/9/be/src/runtime/tuple.h@184 PS9, Line 184: Helper for MaterializeStrings() If this isn't intended to be called outside this class, make it private? -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 19 Oct 2017 17:09:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#7). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Functional Testing: - Added new planner tests and updated the old ones. - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test A testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queri
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG@15 PS6, Line 15: them into the Kudu client. Because the Kudu client doesn't provide a > Is there any documentation for Kudu about what ordering Kudu uses for compa I asked the Kudu team, and its not documented anywhere, but they confirmed they use the same ordering as Impala does. http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@106 PS6, Line 106: bloom_filter > has_bloom_filter() ? At callsites it reads like this should return a filter Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@109 PS6, Line 109: min_max_filter > has_min_max_filter? Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc@1220 PS6, Line 1220: MinMaxFilter::Or(src_type(), params.min_max_filter, min_max_filter_.get()); > I mentioned this in a later comment, but we should consider what happens if Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc@235 PS6, Line 235: MinMaxFilter* min_max_filter = MinMaxFilter::Create(type, &obj_pool_); > (nit) - could combine into one line. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc File be/src/runtime/runtime-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc@27 PS6, Line 27: // to a) the atomicity of / pointer assignments and b) the x86 TSO memory model. > Comment not relevant any more since we're using actual atomics? Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc@53 PS6, Line 53: min_str = string(value->ptr, value->len); > We should think through the behaviour with very large strings. It may make Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h@83 PS6, Line 83: #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE) \ > Yeah, the macro seems like it may be the least of the evils. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@24 PS6, Line 24: " > #include "runtime/timestamp-value.inline.h" Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@77 PS6, Line 77: std::string NAME##MinMaxFilter::DebugString() const { \ > nit: don't need std:: prefix in .cc files. Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@139 PS6, Line 139: out->min.__set_string_val(std::min(in.min.string_val, out->min.string_val)); > I feel like we should use our StringValue::Compare() function instead of re Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291 PS6, Line 291: BigIntMinMaxFilter::Or(in, out); > Shouldn't this be calling the timestamp method? TColumnData doesn't have a timestamp field, so I convert timestamps with UtcToUnixTimeMicros and pass them around in thrift as longs. (noted in a comment now) Unless I'm missing a reason that comparing timestamps with way won't lead to the same results as comparing the unconverted TimestampValues. http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@327 PS6, Line 327: } > Timestamp? Same as above. http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379 PS6, Line 379: if (slotRef == null || slotRef.getDesc().getColumn() == null > Are there planner tests for all these cases? Done http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.tes
[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 ) Change subject: IMPALA-5789: Add always_false flag in bloom filter .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d Gerrit-Change-Number: 8170 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 11 Oct 2017 19:22:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8258 ) Change subject: IMPALA-6027: Retry downloading toolchain components. .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b Gerrit-Change-Number: 8258 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 11 Oct 2017 16:48:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 ) Change subject: IMPALA-5789: Add always_false flag in bloom filter .. Patch Set 3: (6 comments) This is definitely going to conflict with my patch, but you should be able to get this in first so I'll probably have to do the painful rebasing. http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG@13 PS3, Line 13: Testing I assume you ran the existing runtime filter tests? http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc@129 PS3, Line 129: order to provide a regression test for IMPALA-3798 and a test for IMPALA-5789. This is a little cluttered. I think its okay to just say "for testing purposes" and anyone who wants to know the exact JIRAs can easily grep for uses of it. http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h@129 PS3, Line 129: HasAlwaysFalseInList I think this needs to be renamed for two reasons: - We're eventually going to be adding 'in-list' filters (in addition to the existing "bloom" and soon "min-max" filters), so the "InList" here is potentially confusing. - Its unusual for a function starting with "Has" to have side effects (incrementing the stats). Maybe "CheckForAlwaysFalse"? Or something better you come up with. http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h File be/src/runtime/runtime-filter.inline.h: http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h@53 PS3, Line 53: return bloom_filter_ != BloomFilter::ALWAYS_TRUE_FILTER && : bloom_filter_->AlwaysFalse(); single line? http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h@111 PS3, Line 111: hasn't been inserted any elements hasn't had any elements inserted http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py File tests/custom_cluster/test_always_false_filter.py: http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py@34 PS3, Line 34: impalad = self.cluster.get_any_impalad() : client = impalad.service.create_beeswax_client() I think you can just add 'cursor' as a parameter to the 'test_' functions. -- To view, visit http://gerrit.cloudera.org:8080/8170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d Gerrit-Change-Number: 8170 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 10 Oct 2017 18:11:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5529: [DOCS] New trunc() signatures
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8189 ) Change subject: IMPALA-5529: [DOCS] New trunc() signatures .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ice4753dee4f7b8e09c35508a9cad1e36f4ab2826 Gerrit-Change-Number: 8189 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 06 Oct 2017 21:59:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Change BufferedTupleStream::GetRows to returning multi batches
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Change BufferedTupleStream::GetRows to returning multi batches .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG@14 PS1, Line 14: Any testing? http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/exec/partitioned-hash-join-node.cc@950 PS1, Line 950: break; I think this changes the behavior here - this will only break out of the FOREACH_ROW loop, but what we want is to stop iterating over the entire set of build rows (that was previously one batch, but is now the same set of rows split into multiple batches). http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h File be/src/runtime/buffered-tuple-stream.h: http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h@339 PS1, Line 339: vector>* batch Can you move this to be after 'batch_size'? We generally keep out parameters as the trailing parameters for clarity. -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 06 Oct 2017 18:24:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 6: (32 comments) http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/5//COMMIT_MSG@9 PS5, Line 9: This patch implements min-max filters for runtime filters. Each > I had a general terminology question that popped up as I read through. Does My intention (which I probably wasn't super consistent about, I'll try to clean it up) was that a "runtime filter" corresponds to a single filter id and might contain and bloom and/or min-max (eventually and/or in-list) filters. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@80 PS5, Line 80: /// FilterContext contains all metadata for a single runtime filter, and allows the filter > This seems to be an example of a place where "runtime filter" could potenti Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.h@107 PS5, Line 107: > Can you add a blank line between methods? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@422 PS5, Line 422: col_type, filter_expr->type().ToIR(codegen), "expr_type_arg"); > Why is decimal special here? Decimal isn't supported for Kudu, so I didn't implement min-max filters for it since they wouldn't be testable. I special cased it here because you might have a decimal bloom filter and you would hit this code path, but a better solution is to just check if the filter desc says to build a min-max filter, which will never be the case if the type is decimal. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/filter-context.cc@434 PS5, Line 434: // Call Insert() on the bloom filter. > Can we make this a lookup table? Seems like the only difference between the Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/hdfs-parquet-scanner.cc@401 PS5, Line 401: if (filter->BloomAlwaysTrue() > Can we phrase the method in a way that doesn't depend on the assumption tha I changed this to highlight the fact that the hdfs scanner only deals with bloom filters for now. The runtime filter here might have a min-max filter, if it also has a kudu scan target, and since min-max filters don't really have a concept of "AlwaysTrue" we would have to say the filter isn't always true. That would make sense if we were actually applying the min-max filter here, but its sort of weird since we're only really applying the bloom filter. It can then be changed to not be bloom specific when we add consideration of min-max filters here. http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@174 PS5, Line 174: for (const FilterContext& ctx : scan_node_->filter_ctxs_) { > How important is this and is it blocked by anything else? If it's impactful Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@175 PS5, Line 175: filt > nullptr Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@177 PS5, Line 177: se()) { > dynamic_cast seems unnecessary since the type should always be correct - ca Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-scanner.cc@179 PS5, Line 179: eCurre > const string&? Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/kudu-util.cc@231 PS5, Line 231: int64_t ts_micros; > This timestamp code is a bit subtle. Is there a sane way to combine the con Done http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/7793/5/be/src/exec/partitioned-hash-join-builder.cc@548 PS5, Line 548: IMER(repartition_ti > Is this meant to double-count the filters if there is a bloom and min-max f You're right, this needs to be counting "runtime filters" (corresponding to a single filter id/FilterContext). This gets a little confusing, since you might have a bloom filter that gets disabled but its not counted as disabled here because the min-max filter isn't disabled, but it'll make a lot more sense in the long run goal here of having all scanners support all filter types (and of course the fact that the bloom filter is disabled in that situation will still show up in the runtime profile inf
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Tim Armstrong, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#6). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter and/or a min-max filter, depending on if it has HDFS and/or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Testing: - Updated planner tests. - Ran existing runtime filter tests. - Ran preliminary perf tests to demonstrate that it works. Will update with more specific results. - Still needs more e2e tests. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test A testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test A testdata/workloads/functional-query/queries/QueryTest/bloom_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test M tests/common/impala_test_suite.py M tests/query_test/test_runtime_filters.py M tests/util/test_file_parser.py 47 files changed, 1,581 insertions(+), 377 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/6 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8191 ) Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml@66 PS4, Line 66: ALTER TABLE name ALTER [COLUMN] column_name > Just to confirm, no doc change needed as a result of this exchange? Correct, no change needed. -- To view, visit http://gerrit.cloudera.org:8080/8191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d Gerrit-Change-Number: 8191 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 06 Oct 2017 14:56:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8191 ) Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: http://gerrit.cloudera.org:8080/#/c/8191/4/docs/topics/impala_alter_table.xml@66 PS4, Line 66: ALTER TABLE name ALTER [COLUMN] column_name > does Impala support multiple alter steps at once? No, that requires some significant changes to how Impala does analysis of ALTER TABLE statements, so we opted to go for the simpler solution of just a single alter op for now. -- To view, visit http://gerrit.cloudera.org:8080/8191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d Gerrit-Change-Number: 8191 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 05 Oct 2017 21:54:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8207 ) Change subject: Bump Kudu version to bec2a24 .. Patch Set 2: Code-Review+2 carrying forward -- To view, visit http://gerrit.cloudera.org:8080/8207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5 Gerrit-Change-Number: 8207 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 05 Oct 2017 15:46:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8191 ) Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml@1014 PS3, Line 1014: or compacted This is not correct. Only storage attribute changes get applied to compacted rows, not default value changes. -- To view, visit http://gerrit.cloudera.org:8080/8191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d Gerrit-Change-Number: 8191 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 04 Oct 2017 18:38:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8207 Change subject: Bump Kudu version to bec2a24 .. Bump Kudu version to bec2a24 Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8207/1 -- To view, visit http://gerrit.cloudera.org:8080/8207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5 Gerrit-Change-Number: 8207 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8191 ) Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@67 PS2, Line 67: ] extraneous? http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@69 PS2, Line 69: COMMENT 'comment_text' This does parse, but it will always give an error because Kudu doesn't support column comments. http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@71 PS2, Line 71: { missing the closing '}' http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@1038 PS2, Line 1038: any newly inserted rows These attributes may in some cases end up applied to existing rows as Kudu performs compactions. Not sure if that should be noted here. -- To view, visit http://gerrit.cloudera.org:8080/8191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d Gerrit-Change-Number: 8191 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 03 Oct 2017 21:57:32 + Gerrit-HasComments: Yes
[native-toolchain-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8192 ) Change subject: Bump Kudu version to bec2a24 .. Bump Kudu version to bec2a24 Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Alex Behm: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/8192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd Gerrit-Change-Number: 8192 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall
[native-toolchain-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8192 ) Change subject: Bump Kudu version to bec2a24 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd Gerrit-Change-Number: 8192 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 03 Oct 2017 18:30:55 + Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8192 ) Change subject: Bump Kudu version to bec2a24 .. Patch Set 1: http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/473/ -- To view, visit http://gerrit.cloudera.org:8080/8192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd Gerrit-Change-Number: 8192 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 03 Oct 2017 01:38:33 + Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to bec2a24
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8192 Change subject: Bump Kudu version to bec2a24 .. Bump Kudu version to bec2a24 Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/92/8192/1 -- To view, visit http://gerrit.cloudera.org:8080/8192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1d1416e475488db580a4f9cb8ed2882440d9abcd Gerrit-Change-Number: 8192 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5383: [DOCS] Document unpartitioned Kudu tables
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8180 ) Change subject: IMPALA-5383: [DOCS] Document unpartitioned Kudu tables .. Patch Set 3: (1 comment) > Thomas, can I tag you in for a few Kudu-related reviews for 5.13, > in place of MJ? Yep, happy to take a look http://gerrit.cloudera.org:8080/#/c/8180/3/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: http://gerrit.cloudera.org:8080/#/c/8180/3/docs/topics/impala_create_table.xml@483 PS3, Line 483: -- Single partition. Only for When I built this locally, these tags were present in the output. -- To view, visit http://gerrit.cloudera.org:8080/8180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2b466e1e482d62de84253c0cb406668fd5ad5eb Gerrit-Change-Number: 8180 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 02 Oct 2017 15:54:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8148 ) Change subject: IMPALA-4252: Move runtime filters to ScanNode .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h File be/src/exec/scan-node.h: http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@195 PS1, Line 195: Scanner > Comment needs an update - this is no longer an argument. Done http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.h@198 PS1, Line 198: rs to arrive, check > Both HdfsScanNodeBase and KuduScanNodeBase have a "RuntimeState* runtime_st Done http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/8148/1/be/src/exec/scan-node.cc@79 PS1, Line 79: // TODO: Move this to Prepare() > Does this comment still make sense? I wish whoever added it had mentioned w Not sure, it was left by Michael (I'll ping him about it) in the review "Disentangle Expr and ExprContext" You asked about it during that review too, and his response was "I am thinking of moving it when PlanNode is introduced." but I'm not sure what that means. -- To view, visit http://gerrit.cloudera.org:8080/8148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 Gerrit-Change-Number: 8148 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 29 Sep 2017 22:34:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8148 to look at the new patch set (#2). Change subject: IMPALA-4252: Move runtime filters to ScanNode .. IMPALA-4252: Move runtime filters to ScanNode As a preliminary step towards adding runtime filters for Kudu scans, this patch moves runtime filter related code from HdfsScanNodeBase to ScanNode so that it's available to KuduScanNodeBase. The change was mechanical with no logic changes, except for moving the calculation of the max wait time into WaitForRuntimeFilters(). Testing: - Ran existing runtime filter tests. Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h 6 files changed, 131 insertions(+), 114 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/2 -- To view, visit http://gerrit.cloudera.org:8080/8148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 Gerrit-Change-Number: 8148 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8169 ) Change subject: IMPALA-5994: Lower case struct-field names .. Patch Set 4: Code-Review+2 Carrying forward +2. Gvo failed due to trivial test issue. -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 29 Sep 2017 21:14:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names
Hello Lars Volker, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8169 to look at the new patch set (#4). Change subject: IMPALA-5994: Lower case struct-field names .. IMPALA-5994: Lower case struct-field names Impala tries to always store column names in lower case. As part of a cleanup of issues related to upper case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py - Fixed FE test that expected struct field to be output in upper case. Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e --- M fe/src/main/java/org/apache/impala/catalog/StructField.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/query_test/test_nested_types.py 3 files changed, 19 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/4 -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8154 ) Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test File testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test: http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test@24 PS1, Line 24: > Isn't this test also susceptible to the same timing issue? I don't see much Done -- To view, visit http://gerrit.cloudera.org:8080/8154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Gerrit-Change-Number: 8154 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 29 Sep 2017 18:55:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8154 to look at the new patch set (#2). Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout .. IMPALA-5951: Remove flaky test_catalogd_timeout test_catalogd_timeout sets a Kudu operation timeout of 1ms and then performs various Kudu operations which it expects to fail due to a timeout. Since the test was written, things have sped up - for example, Impala used to create a new Kudu client for each operation, but that was changed in IMPALA-5167, such that the operations now occasionally complete quickly enough that they don't timeout. There's not really any way to rewrite this test to ensure that it won't be flaky, so the patch removes it. Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 --- D testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test M tests/custom_cluster/test_kudu.py 2 files changed, 0 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8154/2 -- To view, visit http://gerrit.cloudera.org:8080/8154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Gerrit-Change-Number: 8154 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8169 ) Change subject: IMPALA-5994: Lower case struct-field names .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8169/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8169/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-5994: Lower case struct-field names > Commit msg should describe the change/fix, not the symptom of the bug. For Done http://gerrit.cloudera.org:8080/#/c/8169/2/fe/src/main/java/org/apache/impala/catalog/StructField.java File fe/src/main/java/org/apache/impala/catalog/StructField.java: http://gerrit.cloudera.org:8080/#/c/8169/2/fe/src/main/java/org/apache/impala/catalog/StructField.java@37 PS2, Line 37: // Impala expects field names to be in lower case, but type strings stored in the HMS > // Impala expects field names to be in lower case, but type strings stored Done -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 29 Sep 2017 16:16:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5994: Lower case struct-field names
Hello Lars Volker, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8169 to look at the new patch set (#3). Change subject: IMPALA-5994: Lower case struct-field names .. IMPALA-5994: Lower case struct-field names Impala tries to always store column names in lower case. As part of a cleanup of issues related to upper case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e --- M fe/src/main/java/org/apache/impala/catalog/StructField.java M tests/query_test/test_nested_types.py 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/3 -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8169 ) Change subject: IMPALA-5994: Failure in star expansion on struct fields .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8169/1/fe/src/main/java/org/apache/impala/catalog/StructField.java File fe/src/main/java/org/apache/impala/catalog/StructField.java: http://gerrit.cloudera.org:8080/#/c/8169/1/fe/src/main/java/org/apache/impala/catalog/StructField.java@37 PS1, Line 37: // Impala expects column names to be lower cased, but this is not always true for > Can you add a brief comment why we need to do this? Done http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py@96 PS1, Line 96: rCase > nit: This seems easy to miss. Can you pick something more obvious, like cam Done http://gerrit.cloudera.org:8080/#/c/8169/1/tests/query_test/test_nested_types.py@100 PS1, Line 100: U > Does the uppercase F matter here? Not really. This query passed even before the fix, just adding a little extra coverage while I'm here. -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 28 Sep 2017 22:27:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields
Hello Lars Volker, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8169 to look at the new patch set (#2). Change subject: IMPALA-5994: Failure in star expansion on struct fields .. IMPALA-5994: Failure in star expansion on struct fields Impala tries to always store column names in lower case. As part of a cleanup of issues related to upper case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e --- M fe/src/main/java/org/apache/impala/catalog/StructField.java M tests/query_test/test_nested_types.py 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/2 -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5994: Failure in star expansion on struct fields
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8169 Change subject: IMPALA-5994: Failure in star expansion on struct fields .. IMPALA-5994: Failure in star expansion on struct fields Impala tries to always store column names in lower case. As part of a cleanup of issues related to mixed case Kudu column names, a check was added in Analyzer to enforce this. The check fails when doing star expansion on a struct to select all fields in the case where a table was created in Hive with upper case letters in a struct field name. This happens because Hive does not covert struct field names to all lower case in HMS. The solution is to force StructField names to lower case. Testing: - Added a test in test_nested_types.py Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e --- M fe/src/main/java/org/apache/impala/catalog/StructField.java M tests/query_test/test_nested_types.py 2 files changed, 16 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8169/1 -- To view, visit http://gerrit.cloudera.org:8080/8169 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iacd9714ac2301a55ee8b64f0102f6f156fb0370e Gerrit-Change-Number: 8169 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5951: test catalogd timeout fails to cause expected exception
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8154 Change subject: IMPALA-5951: test_catalogd_timeout fails to cause expected exception .. IMPALA-5951: test_catalogd_timeout fails to cause expected exception test_catalogd_timeout sets a Kudu operation timeout of 1ms and then performs various Kudu operations which it expects to fail due to a timeout. Since the test was written, things have sped up - for example, Impala used to create a new Kudu client for each operation, but that was changed in IMPALA-5167, such that the operations now occasionally complete quickly enough that they don't timeout. This patch disables all but the first test. The remaining tests can likely be reactivated once we have a way of invalidating clients (IMPALA-5685). Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 --- M testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test 1 file changed, 13 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8154/1 -- To view, visit http://gerrit.cloudera.org:8080/8154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Gerrit-Change-Number: 8154 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 5: (1 comment) > The Parquet writer does something similar for the Parquet > statistics, can any of that code be reused with regards to codegen? This was discussed in the original design thread, and we felt that there was too much parquet specific stuff current in/going to be added soon in the parquet statistics code, so we decided to just keep it separate. http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift@740 PS2, Line 740: 4: optional i64 queue_timeout_ms; : : // Default query options that are applied to requests mapped to this pool. : > This could be a reasonable approach, or another idea is: Done -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 27 Sep 2017 03:08:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#5). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter and/or a min-max filter, depending on if it has HDFS and/or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Testing: - Updated planner tests. - Ran existing runtime filter tests. - Ran preliminary perf tests to demonstrate that it works. Will update with more specific results. - Still needs more e2e tests. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test 37 files changed, 1,464 insertions(+), 146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/5 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] DRAFT - IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#4). Change subject: DRAFT - IMPALA-4252: Min-max runtime filters for Kudu .. DRAFT - IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter and/or a min-max filter, depending on if it has HDFS and/or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Testing: - Updated planner tests. - Ran existing runtime filter tests. - Ran preliminary perf tests to demonstrate that it works. Will update with more specific results. - Still needs more e2e tests. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test 37 files changed, 1,464 insertions(+), 146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/4 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8148 Change subject: IMPALA-4252: Move runtime filters to ScanNode .. IMPALA-4252: Move runtime filters to ScanNode As a preliminary step towards adding runtime filters for Kudu scans, this patch moves runtime filter related code from HdfsScanNodeBase to ScanNode so that it's available to KuduScanNodeBase. Testing: - Ran existing runtime filter tests. Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h 4 files changed, 126 insertions(+), 107 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/1 -- To view, visit http://gerrit.cloudera.org:8080/8148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 Gerrit-Change-Number: 8148 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 ) Change subject: IMPALA-5870: Improve runtime profile for partial sort .. Patch Set 3: The GVO failed because there was a kudu insert test that checked the profile for the 'SpilledRuns: 0' line. Since that is now gone, the test checks for 'SortType: Partial' (the real point of the test was that the query wouldn't have run previously due to the mem limit, which still applies, so we're not particularly losing any coverage). -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 19:46:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8123 to look at the new patch set (#3). Change subject: IMPALA-5870: Improve runtime profile for partial sort .. IMPALA-5870: Improve runtime profile for partial sort A recent change (IMPALA-5498) added the ability to do partial sorts, which divide their input up into runs each of which is sorted individually, avoiding the need to spill. Some of the debug output wasn't updated vs. regular sorts, leading to confusion. This patch removes the counters 'SpilledRuns' and 'MergesPerformed' since they will always be 0, and it renames the 'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial' refers to the fact that in a regular sort those runs may be spilled or merged. It also adds a profile info string 'SortType' that can take the values 'Total', 'TopN', or 'Partial' to reflect the type of exec node being used. Example profile snippet for a partial sort: SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%) SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1) - InMemorySortTime: 34.201us - PeakMemoryUsage: 2.02 MB (2117632) - RowsReturned: 44 (44) - RowsReturnedRate: 109.11 K/sec - RunsCreated: 1 (1) - SortDataSize: 572.00 B (572) Testing: - Manually ran several sorting queries and inspected their profiles - Updated a kudu_insert test that relied on the 'SpilledRuns' counter to be 0 for a partial sort. Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/sorter.cc M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 5 files changed, 11 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/3 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 ) Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 7: Code-Review+2 GVO failed because of WARN_UNUSED_RESULT not being followed. Added a RETURN_IF_ERROR -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-Change-Number: 8029 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Sep 2017 15:26:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8029 to look at the new patch set (#7). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 10 files changed, 311 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/7 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-Change-Number: 8029 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 ) Change subject: IMPALA-5870: Improve runtime profile for partial sort .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14 PS1, Line 14: This patch removes the counters 'SpilledRuns' and 'MergesPerformed' > I'm skeptical about this part of the change. It definitely a wart but there Sounds good to me. I removed the EXPLAIN related stuff from this patch, and filed IMPALA-5972 http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520 PS1, Line 1520: initial_runs_ > "runs_counter_" is a bit inaccurate for spilling sorts since it doesn't inc Done -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 23 Sep 2017 00:05:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8123 to look at the new patch set (#2). Change subject: IMPALA-5870: Improve runtime profile for partial sort .. IMPALA-5870: Improve runtime profile for partial sort A recent change (IMPALA-5498) added the ability to do partial sorts, which divide their input up into runs each of which is sorted individually, avoiding the need to spill. Some of the debug output wasn't updated vs. regular sorts, leading to confusion. This patch removes the counters 'SpilledRuns' and 'MergesPerformed' since they will always be 0, and it renames the 'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial' refers to the fact that in a regular sort those runs may be spilled or merged. It also adds a profile info string 'SortType' that can take the values 'Total', 'TopN', or 'Partial' to reflect the type of exec node being used. Example profile snippet for a partial sort: SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%) SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1) - InMemorySortTime: 34.201us - PeakMemoryUsage: 2.02 MB (2117632) - RowsReturned: 44 (44) - RowsReturnedRate: 109.11 K/sec - RunsCreated: 1 (1) - SortDataSize: 572.00 B (572) Testing: - Manually ran several sorting queries and inspected their profiles Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/sorter.cc 4 files changed, 10 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/2 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 ) Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@216 PS2, Line 216: > Forgot in the previous pass: can you include an example of the IR it genera Done http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@217 PS2, Line 217: Status FilterContext::CodegenInsert( > So I've been working on this, but it turns out to be tricky. I uploaded a n After much discussion with Tim, we've decided not to bother trying to do the cross-compilation and to just stick with the ir builder version. The reason basically is that the Function returned by ScalarExpr::GetCodegendComputeFn() can't quite be used as a drop-in replacement for ScalarExprEvaluator::GetValue() (the first returns the value directly while the second returns a pointer) and the work necessary to fix this without losing efficiency is beyond the scope of this patch. http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc@935 PS2, Line 935: Status PhjBuilder::CodegenInsertRuntimeFilters( > It would be good to have an example of the IR here too. Done http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc File be/src/util/bloom-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc@24 PS2, Line 24: IR_ALWAYS_INLINE > Is the always_inline needed here? Done -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-Change-Number: 8029 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Sep 2017 21:35:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8029 to look at the new patch set (#6). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 10 files changed, 311 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/6 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-Change-Number: 8029 Gerrit-PatchSet: 6 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8123 Change subject: IMPALA-5870: Improve explain/profile output for partial sort .. IMPALA-5870: Improve explain/profile output for partial sort A recent change (IMPALA-5498) added the ability to do partial sorts, which divide their input up into runs each of which is sorted individually, avoiding the need to spill. Some of the debug output wasn't updated vs. regular sorts, leading to confusion. For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for partial sorts, since they can't spill. It does this by setting the spillable buffer size in the resource profile to -1. Since the BE relied on that number to determine the page size for sorts, it now calculates the page size from the min reservation, which gives an equivalent value. For the runtime profile, it removes the counters 'SpilledRuns' and 'MergesPerformed' since they will always be 0, and it renames the 'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial' refers to the fact that in a regular sort those runs may be spilled or merged. It also adds a profile info string 'SortType' that can take the values 'Total', 'TopN', or 'Partial' to reflect the type of exec node being used. Example profile snippet for a partial sort: SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%) SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1) - InMemorySortTime: 34.201us - PeakMemoryUsage: 2.02 MB (2117632) - RowsReturned: 44 (44) - RowsReturnedRate: 109.11 K/sec - RunsCreated: 1 (1) - SortDataSize: 572.00 B (572) Testing: - Manually ran several sorting queries and inspected their explain plans and profile Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M fe/src/main/java/org/apache/impala/planner/SortNode.java 6 files changed, 22 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/1 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: Line 225: Constant* expr_type_arg = codegen->ConstantToGVPtr( > Yeah, I have no idea why it doesn't work. The types definitely line up or i So I've been playing around with this more, and I have a theory. I noticed that it works if I only replace 'GetType' and not 'GetValue', so the problem may actually be with 'GetValue'. I think that the value that's being returned by the function generated by GetCodegendComputePtrFn() is stack allocated (because ToNativePtr() is called on 'result' which then calls CreateEntryBlockAlloca, which allocates stack space). Basically, GetCodegendComputePtrFn() is trying to create a codegend copy of ScalarExprEvaluator::GetValue() but its not currently doing the part where the value is stored in 'result_' so that a pointer to it can be passed back. One solution would be to figure out how to write IRBuilder for storing the value in 'result_' (or else cross compile ScalarExprEvaluator::GetValue() and sub things in), but that's potentially complicated. It may also be worth just going back to the original version of this patch to eliminate the perf cost from having to store the value in 'result_'. Or maybe you had something in mind with your other comment about not needing to create a whole new function that would solve this problem as well. -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5949: fix test exchange small delay failure
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5949: fix test_exchange_small_delay failure .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia75c42be2de600344de7af5a917d7843880ea6de Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/CMakeLists.txt File be/src/exec/CMakeLists.txt: Line 40: filter-context-ir.cc > Missing this file? Done http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: Line 225: Constant* expr_type_arg = codegen->ConstantToGVPtr( > This does look like it should work to me. I think the types should line up Yeah, I have no idea why it doesn't work. The types definitely line up or it wouldn't pass verification, and as far as I can tell the type is being passed correctly, its just somehow corrupting/overwriting the actual value that's getting hashed. I've been playing around with it, and the only thing I can figure out that makes it work is to change the cross-complied function to assign the ColumnType to a local variable rather than just handling a ref to it, but of course that somewhat defeats the point here. Is there someone who knows more about llvm that can help me with this? I can post the generated IR somewhere for someone to look at. I've already spent quite a lot of time trying to get this to work, and I'm not sure what else to do since I'm finding llvm to be very poorly documented. http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: Line 368: // Set the pointer to NULL in case it evaluates to NULL. > It seems like the main way this differs from other places is returning a nu I don't know what you mean by this. If we're going to replace ScalarExprEval::GetValue() with ReplaceCallSites, then we need another function to replace it with. -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8029 to look at the new patch set (#5). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/filter-context-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 16 files changed, 278 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/5 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8029 to look at the new patch set (#4). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/filter-context-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 16 files changed, 279 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/4 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: Line 217: > You're right, we should be able to cross-compile Insert() and substitute in So I've been working on this, but it turns out to be tricky. I uploaded a new patch, but it doesn't fully work. One issue is that the function returned by ScalarExpr::GetCodegendComputeFn() can't be used as a drop-in replacement for ScalarExprEvaluator::GetValue() as they have different return types (AnyVal and void* respectively). I solved this with ScalarExpr::GetCodegendComputePtrFn(), but it means keeping a lot of the IRBuilder code that's here. The other issue is how to deal with making the type argument to GetHashValue() a constant. As far as I can tell, we don't already have any tools for directly replacing arguments to functions. The latest version of the patch defines a function GetType(), called in FilterContext::Insert(), that I replace with ReplaceCallSitesWithValue(), but this doesn't work (in particular, the values to be hashed that are passed to GetHashValue() appear to be corrupted, and from dumping the IR the branching isn't eliminated anyways), and I'm not sure if its even reasonable to expect it to work since we don't use ReplaceCallSitesWithValue() in that way anywhere else. I also considered making something like RawValue::CodegenGetHashValue() which would take the type and return a codegen'd function that can be used to replace GetHashValue() with ReplaceCallSites() (ie. the codegen'd function would still also take a type argument but would ignore it), but this would be a bunch more IRBuilder (more even than the original version of the patch, I think), so I wanted to see if that approach makes sense before doing it. http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: Line 135: // Same as Insert(), but skips the CPU check and assumes that AVX is not available. > naming nit: We actually need AVX2 -- AVX won't cut it Done -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8029 to look at the new patch set (#3). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/exprs/scalar-expr.cc M be/src/exprs/scalar-expr.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 15 files changed, 244 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/3 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that happens in RawValue::GetHashValue(), and eliminate the AVX check that happens in BloomFilter::Insert(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 10 files changed, 246 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/2 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3360: Codegen inserting into runtime filters .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/filter-context.h File be/src/exec/filter-context.h: Line 124: static Status CodegenInsert(LlvmCodeGen* codegen, ScalarExpr* filter_expr, > This is a bit tricky because it doesn't actually do exactly the same thing Done http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 953: for (int i = 0; i < num_filters; ++i) { > Nit: it looks like the two branches are identical except for adding the NoI Done Line 958: Value* filter_context_ptr = > Going forward, we want to avoid have this dependence between the codegen co Done http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/util/bloom-filter-ir.cc File be/src/util/bloom-filter-ir.cc: Line 32: } > It would be nice to keep this in header file to avoid regressing the non-co Done -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5416: Fix an impala-shell command recursion bug .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5416: Fix an impala-shell command recursion bug .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8063/2//COMMIT_MSG Commit Message: PS2, Line 12: The cause is that there is a "cmdqueue" member in cmd library, which is : used to execute commands not directly from user input. When impala-shell : reads a line with multiple commands, it splits the line into multiple : queries and insert them into the queue, and then gives control back to : the eventloop in cmd library. The problem is that a source command : calls execute_query_list(), which executes queries in the cmdqueue as : well. So any query in the cmdqueue will be executed twice. And if there : is unfortunately a source command in the cmdqueue, it will call : execute_query_list() again, and there will be an infinite recursion. : The original purpose of running queued queries in execute_query_list() : is that in non-interactive mode, there is no event loop. And still, : there are queries like "use database" queued by connection setup : procedures, which need to be run before the user query. : This patch avoids running queries from the queue in : execute_query_list(), and for non-interactive mode, runs queued queries : in execute_queries_non_interactive_mode() instead. Thanks for the explanation. Hate to be picky, but this is more technical detail than I was looking for. The idea here is to give the reader a high level idea of what's going on while keeping it simple. Maybe something like (assuming I understand completely): The cmdqueue member of cmd.Cmd is used to execute commands not directly from user input in an event loop. When a 'source' is run, execute_query_list() is called which also executes the commands in cmdqueue, causing them to be executed twice. The fix is for execute_query_list() to not run the commands in cmdqueue. For the non-interactive case, where the event loop won't be run, we call execute_query_list() with cmdqueue so that the commands get run. -- To view, visit http://gerrit.cloudera.org:8080/8063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5416: Fix an impala-shell command recursion bug
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5416: Fix an impala-shell command recursion bug .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8063/1//COMMIT_MSG Commit Message: PS1, Line 11: This patch fixes these bugs. Can you describe briefly what the problem with the existing code was? Looking at the code change, my intuition is that the old and new code should be equivalent, but apparently that's not the case. -- To view, visit http://gerrit.cloudera.org:8080/8063 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I453af2d4694d47e184031cb07ecd2af259ba20f3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8040 Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8040/1 -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1583: whitespace PS3, Line 3034: whitespace, here and below http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS3, Line 26: Class describing a predicate that tests a boolean values using IS. : * Example: (1 < 1) IS TRUE You should explain more thoroughly what this expr actually does, eg. the handling of NULLs. PS3, Line 28: a : * CompoundPredicate weird line wrapping here PS3, Line 33: _ We don't use trailing underscores on constant values. PS3, Line 94: assert Preconditions.checkState http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS3, Line 39: if (!(expr instanceof BoolTestPredicate)) : return expr; single line PS3, Line 64: private BoolTestToCompoundRule() { : } single line http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS3, Line 164: whitespace http://gerrit.cloudera.org:8080/#/c/8014/3/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 1257: // BoolTestPredicate. modify one of these tests to use "NOT" -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has submitted this change and it was merged. Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Matthew Jacobs: Looks good to me, approved Thomas Tauber-Marshall: Verified -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5912: fix crash in trunc(..., "WW") in release build
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5912: fix crash in trunc(..., "WW") in release build .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5017518188f5025daa5040ca1943581a0675726 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8029 Change subject: IMPALA-3360: Codegen inserting into runtime filters .. IMPALA-3360: Codegen inserting into runtime filters This patch codegens PhjBuilder::InsertRuntimeFilters() and FilterContext::Insert(). This allows us to unroll the loop over all the filters in PhjBuilder::ProcessBuildBatch(), including skipping consideration of FilterContext's that don't have a local_bloom_filter, and eliminates the branch on type that happens in RawValue::GetHashValue(). Testing: - Ran existing runtime filter tests. - Ran perf tests locally (all avg. over three runs): - Four way self join on tpch_parquet.lineitem. Should be a good case for this as there's several large hash join build sides that will benefit from the codegen. Total query running time improved ~7% (from 16.07s to 14.91s). - Single join of tpch_parquet.lineitem against a selectively filtered tpch_parquet.lineitem. Should be a bad case for this patch, as the build side of the join is very small. Total query running time regressed by about ~2% (from 0.73s to 0.75s) due to an increase in codegen time (from 295ms to 309ms for the fragment containing the hash join). Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/util/CMakeLists.txt A be/src/util/bloom-filter-ir.cc M be/src/util/bloom-filter.h 10 files changed, 232 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/1 -- To view, visit http://gerrit.cloudera.org:8080/8029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[native-toolchain-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8028 Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/28/8028/1 -- To view, visit http://gerrit.cloudera.org:8080/8028 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I38334bdbaa1e7a6c6f5d595c26efb76efa062305 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] Bump Kudu version to a71ecfd
Thomas Tauber-Marshall has posted comments on this change. Change subject: Bump Kudu version to a71ecfd .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie23d852f0d630f9484d8ae4f772af6bba13ea24f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No