[impala] 03/04: Turn off shell debug tracing for create-load-data.sh
This is an automated email from the ASF dual-hosted git repository. tmarshall pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 6938831ae98f3e9cdb63ecbf63a608c17bdc0b6b Author: Joe McDonnell AuthorDate: Thu Feb 7 14:48:07 2019 -0800 Turn off shell debug tracing for create-load-data.sh This removes a "set -x" from testdata/bin/create-load-data.sh. Change-Id: I524ec48d0264f6180a13d6d068832809bcc86596 Reviewed-on: http://gerrit.cloudera.org:8080/12398 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- testdata/bin/create-load-data.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh index a32d44c..44452ab 100755 --- a/testdata/bin/create-load-data.sh +++ b/testdata/bin/create-load-data.sh @@ -31,7 +31,6 @@ set -euo pipefail . $IMPALA_HOME/bin/report_build_error.sh setup_report_build_error -set -x . ${IMPALA_HOME}/bin/impala-config.sh > /dev/null 2>&1 . ${IMPALA_HOME}/testdata/bin/run-step.sh
[impala] 01/04: IMPALA-7214: [DOCS] More on decoupling impala and DataNodes
This is an automated email from the ASF dual-hosted git repository. tmarshall pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 5b32a0d60110be7c21184819c2dffbb7cbff750f Author: Alex Rodoni AuthorDate: Tue Feb 12 12:40:42 2019 -0800 IMPALA-7214: [DOCS] More on decoupling impala and DataNodes Change-Id: I4b6f1c704c1e328af9f0beec73f8b6b61fba992e Reviewed-on: http://gerrit.cloudera.org:8080/12457 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- docs/topics/impala_processes.xml | 10 +++-- docs/topics/impala_troubleshooting.xml | 39 +- 2 files changed, 23 insertions(+), 26 deletions(-) diff --git a/docs/topics/impala_processes.xml b/docs/topics/impala_processes.xml index 71986d3..70366dd 100644 --- a/docs/topics/impala_processes.xml +++ b/docs/topics/impala_processes.xml @@ -55,10 +55,7 @@ under the License. Start one instance of the Impala catalog service. - -Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local -processing and avoid network traffic due to remote reads. - + Start the main Impala daemon services. @@ -101,9 +98,8 @@ under the License. $ sudo service impala-catalog start - -Start the Impala service on each DataNode using a command similar to the following: - + Start the Impala daemon services using a command similar to the +following: $ sudo service impala-server start diff --git a/docs/topics/impala_troubleshooting.xml b/docs/topics/impala_troubleshooting.xml index 250c899..80b7363 100644 --- a/docs/topics/impala_troubleshooting.xml +++ b/docs/topics/impala_troubleshooting.xml @@ -123,17 +123,17 @@ terminate called after throwing an instance of 'boost::exception_detail::clone_i Troubleshooting I/O Capacity Problems - -Impala queries are typically I/O-intensive. If there is an I/O problem with storage devices, -or with HDFS itself, Impala queries could show slow response times with no obvious cause -on the Impala side. Slow I/O on even a single DataNode could result in an overall slowdown, because -queries involving clauses such as ORDER BY, GROUP BY, or JOIN -do not start returning results until all DataNodes have finished their work. - - -To test whether the Linux I/O system itself is performing as expected, run Linux commands like -the following on each DataNode: - + Impala queries are typically I/O-intensive. If there is an I/O problem +with storage devices, or with HDFS itself, Impala queries could show +slow response times with no obvious cause on the Impala side. Slow I/O +on even a single Impala daemon could result in an overall slowdown, +because queries involving clauses such as ORDER BY, + GROUP BY, or JOIN do not start +returning results until all executor Impala daemons have finished their +work. + To test whether the Linux I/O system itself is performing as expected, +run Linux commands like the following on each host Impala daemon is +running: $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 vm.drop_caches = 3 @@ -265,14 +265,15 @@ $ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k - -Replace hostname and port with the hostname and port of -your Impala state store host machine and web server port. The default port is 25010. - - The number of impalad instances listed should match the expected number of - impalad instances installed in the cluster. There should also be one - impalad instance installed on each DataNode - + Replace hostname and + port with the hostname and port of your +Impala state store host machine and web server port. The +default port is 25010. The number of +impalad instances listed should match the + expected number of impalad instances + installed in the cluster. There should also be one +impalad instance installed on each + DataNode.
[impala] 04/04: IMPALA-8183: fix test_reportexecstatus_retry flakiness
This is an automated email from the ASF dual-hosted git repository. tmarshall pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 9492d451d5d5a82bfc6f4c93c3a0c6e6d0cc4981 Author: Thomas Tauber-Marshall AuthorDate: Tue Feb 12 22:47:52 2019 + IMPALA-8183: fix test_reportexecstatus_retry flakiness The test is designed to cause ReportExecStatus() rpcs to fail by backing up the control service queue. Previously, after a failed ReportExecStatus() we would wait 'report_status_retry_interval_ms' between retries, which was 100ms by default and wasn't touched by the test. That 100ms was right on the edge of being enough time for the coordinator to keep up with processing the reports, so that some would fail but most would succeed. It was always possible that we could hit IMPALA-2990 in this setup, but it was unlikely. Now, with IMPALA-4555 'report_status_retry_interval_ms' was removed and we instead wait 'status_report_interval_ms' between retries. By default, this is 5000ms, so it should give the coordinator even more time and make these issues less likely. However, the test sets 'status_report_interval_ms' to 10ms, which isn't nearly enough time for the coordinator to do its processing, causing lots of the ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty often. The solution is to set 'status_report_interval_ms' to 100ms in the test, which roughly achieves the same retry frequency as before. The same change is made to a similar test test_reportexecstatus_timeout. Testing: - Ran test_reportexecstatus_retry in a loop 400 times without seeing a failure. It previously repro-ed for me about once per 50 runs. - Manually verified that both tests are still hitting the error paths that they are supposed to be testing. Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05 Reviewed-on: http://gerrit.cloudera.org:8080/12461 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- tests/custom_cluster/test_rpc_timeout.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/custom_cluster/test_rpc_timeout.py b/tests/custom_cluster/test_rpc_timeout.py index d007ef4..e1a959c 100644 --- a/tests/custom_cluster/test_rpc_timeout.py +++ b/tests/custom_cluster/test_rpc_timeout.py @@ -128,7 +128,7 @@ class TestRPCTimeout(CustomClusterTestSuite): # Inject jitter into the RPC handler of ReportExecStatus() to trigger RPC timeout. @pytest.mark.execute_serially - @CustomClusterTestSuite.with_args("--status_report_interval_ms=10" + @CustomClusterTestSuite.with_args("--status_report_interval_ms=100" " --backend_client_rpc_timeout_ms=1000") def test_reportexecstatus_timeout(self, vector): query_options = {'debug_action': 'REPORT_EXEC_STATUS_DELAY:JITTER@1500@0.5'} @@ -137,7 +137,7 @@ class TestRPCTimeout(CustomClusterTestSuite): # Use a small service queue memory limit and a single service thread to exercise # the retry paths in the ReportExecStatus() RPC @pytest.mark.execute_serially - @CustomClusterTestSuite.with_args("--status_report_interval_ms=10" + @CustomClusterTestSuite.with_args("--status_report_interval_ms=100" " --control_service_queue_mem_limit=1 --control_service_num_svc_threads=1") def test_reportexecstatus_retry(self, vector): self.execute_query_verify_metrics(self.TEST_QUERY, None, 10)
[impala] 02/04: Add support for compiling using OpenSSL 1.1
This is an automated email from the ASF dual-hosted git repository. tmarshall pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit d2b8b7b9b0f3a02e2418d9182007b736bb739a1b Author: Hector Acosta AuthorDate: Fri Feb 8 14:50:17 2019 -0800 Add support for compiling using OpenSSL 1.1 Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264 Reviewed-on: http://gerrit.cloudera.org:8080/12420 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/util/openssl-util.cc | 45 - 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/be/src/util/openssl-util.cc b/be/src/util/openssl-util.cc index 2b66b86..da583cf 100644 --- a/be/src/util/openssl-util.cc +++ b/be/src/util/openssl-util.cc @@ -25,6 +25,7 @@ #include #include #include +#include #include "common/atomic.h" #include "gutil/port.h" // ATTRIBUTE_WEAK @@ -70,7 +71,13 @@ static const int RNG_RESEED_INTERVAL = 128; static const int RNG_RESEED_BYTES = 512; int MaxSupportedTlsVersion() { +#if OPENSSL_VERSION_NUMBER < 0x1010L return SSLv23_method()->version; +#else + // OpenSSL 1.1+ doesn't let us detect the supported TLS version at runtime. Assume + // that the OpenSSL library we're linked against supports only up to TLS1.2 + return TLS1_2_VERSION; +#endif } bool IsInternalTlsConfigured() { @@ -97,13 +104,25 @@ struct ScopedEVPCipherCtx { DISALLOW_COPY_AND_ASSIGN(ScopedEVPCipherCtx); explicit ScopedEVPCipherCtx(int padding) { -EVP_CIPHER_CTX_init(); -EVP_CIPHER_CTX_set_padding(, padding); +#if OPENSSL_VERSION_NUMBER < 0x1010L +ctx = static_cast(malloc(sizeof(*ctx))); +EVP_CIPHER_CTX_init(ctx); +#else +ctx = EVP_CIPHER_CTX_new(); +#endif +EVP_CIPHER_CTX_set_padding(ctx, padding); } - ~ScopedEVPCipherCtx() { EVP_CIPHER_CTX_cleanup(); } + ~ScopedEVPCipherCtx() { +#if OPENSSL_VERSION_NUMBER < 0x1010L +EVP_CIPHER_CTX_cleanup(ctx); +free(ctx); +#else +EVP_CIPHER_CTX_free(ctx); +#endif + } - EVP_CIPHER_CTX ctx; + EVP_CIPHER_CTX* ctx; }; // Callback used by OpenSSLErr() - write the error given to us through buf to the @@ -170,13 +189,13 @@ Status EncryptionKey::EncryptInternal( // mode is well-optimized(instruction level parallelism) with hardware acceleration // on x86 and PowerPC const EVP_CIPHER* evpCipher = GetCipher(); - int success = encrypt ? EVP_EncryptInit_ex(, evpCipher, NULL, key_, iv_) : - EVP_DecryptInit_ex(, evpCipher, NULL, key_, iv_); + int success = encrypt ? EVP_EncryptInit_ex(ctx.ctx, evpCipher, NULL, key_, iv_) : + EVP_DecryptInit_ex(ctx.ctx, evpCipher, NULL, key_, iv_); if (success != 1) { return OpenSSLErr(encrypt ? "EVP_EncryptInit_ex" : "EVP_DecryptInit_ex", err_context); } if (IsGcmMode()) { -if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_SET_IVLEN, AES_BLOCK_SIZE, NULL) +if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_SET_IVLEN, AES_BLOCK_SIZE, NULL) != 1) { return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context); } @@ -189,8 +208,8 @@ Status EncryptionKey::EncryptInternal( int in_len = static_cast(min(len - offset, numeric_limits::max())); int out_len; success = encrypt ? -EVP_EncryptUpdate(, out + offset, _len, data + offset, in_len) : -EVP_DecryptUpdate(, out + offset, _len, data + offset, in_len); +EVP_EncryptUpdate(ctx.ctx, out + offset, _len, data + offset, in_len) : +EVP_DecryptUpdate(ctx.ctx, out + offset, _len, data + offset, in_len); if (success != 1) { return OpenSSLErr(encrypt ? "EVP_EncryptUpdate" : "EVP_DecryptUpdate", err_context); } @@ -201,7 +220,7 @@ Status EncryptionKey::EncryptInternal( if (IsGcmMode() && !encrypt) { // Set expected tag value -if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_SET_TAG, AES_BLOCK_SIZE, gcm_tag_) +if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_SET_TAG, AES_BLOCK_SIZE, gcm_tag_) != 1) { return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context); } @@ -209,14 +228,14 @@ Status EncryptionKey::EncryptInternal( // Finalize encryption or decryption. int final_out_len; - success = encrypt ? EVP_EncryptFinal_ex(, out + offset, _out_len) : - EVP_DecryptFinal_ex(, out + offset, _out_len); + success = encrypt ? EVP_EncryptFinal_ex(ctx.ctx, out + offset, _out_len) : + EVP_DecryptFinal_ex(ctx.ctx, out + offset, _out_len); if (success != 1) { return OpenSSLErr(encrypt ? "EVP_EncryptFinal" : "EVP_DecryptFinal", err_context); } if (IsGcmMode() && encrypt) { -if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_GET_TAG, AES_BLOCK_SIZE, gcm_tag_) +if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_GET_TAG, AES_BLOCK_SIZE, gcm_tag_) != 1) { return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context);
[impala] branch master updated: IMPALA-8186: script to configure docker network
This is an automated email from the ASF dual-hosted git repository. tarmstrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new dbe9fef IMPALA-8186: script to configure docker network dbe9fef is described below commit dbe9fefa05ce9738865349e9130004457ff31c62 Author: Tim Armstrong AuthorDate: Mon Feb 11 16:33:05 2019 -0800 IMPALA-8186: script to configure docker network This automates the network setup that I did manually in http://gerrit.cloudera.org:8080/12189 After running the script it should be possible to run "./buildall.sh -format -testdata" to load test data with the right hostnames, then "start-impala-cluster.py --docker_network=network-name" to run a dockerised minicluster. Change-Id: Icb4854aa951bcad7087a9653845b22ffd862057d Reviewed-on: http://gerrit.cloudera.org:8080/12452 Reviewed-by: Philip Zeyliger Tested-by: Tim Armstrong --- docker/configure_test_network.sh | 53 1 file changed, 53 insertions(+) diff --git a/docker/configure_test_network.sh b/docker/configure_test_network.sh new file mode 100755 index 000..df5567b --- /dev/null +++ b/docker/configure_test_network.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Sets up a Docker bridge network with the name provided by the first argument and +# appends the configuration required to use it in a dockerised minicluster to +# bin/impala-config-local.sh. Note that impala-config.sh needs to be re-sourced, +# cluster configurations need to be regenerated, all minicluster processes restarted, +# and data reloaded for the change to be effective and your cluster to be functional. + +set -euo pipefail + +usage() { + echo "configure_test_network.sh " +} + +if [[ $# != 1 ]]; then + usage + exit 1 +fi + +NETWORK_NAME=$1 + +# Remove existing network if present. +echo "Removing existing network '$NETWORK_NAME'" +docker network rm "$NETWORK_NAME" || true + +echo "Create network '$NETWORK_NAME'" +docker network create -d bridge $NETWORK_NAME +GATEWAY=$(docker network inspect "$NETWORK_NAME" -f '{{(index .IPAM.Config 0).Gateway}}') +echo "Gateway is '${GATEWAY}'" + +echo "Updating impala-config-local.sh" +echo "# Configuration to use docker network ${NETWORK_NAME}" \ + >> "$IMPALA_HOME"/bin/impala-config-local.sh +echo "export INTERNAL_LISTEN_HOST=${GATEWAY}" >> "$IMPALA_HOME"/bin/impala-config-local.sh +echo "export DEFAULT_FS=hdfs://\${INTERNAL_LISTEN_HOST}:20500" \ + >> "$IMPALA_HOME"/bin/impala-config-local.sh
[impala] branch master updated: IMPALA-5861: fix RowsRead for zero-slot table scan
This is an automated email from the ASF dual-hosted git repository. tarmstrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new a154b2d IMPALA-5861: fix RowsRead for zero-slot table scan a154b2d is described below commit a154b2d6e775a508df4fd2c8d51a18d5c1d1f933 Author: Tim Armstrong AuthorDate: Fri Feb 1 07:13:56 2019 -0800 IMPALA-5861: fix RowsRead for zero-slot table scan Testing: Added regression test based on JIRA and a targeted test for all HDFS file formats. Change-Id: I7a927c6a4f0b8055608cb7a5e2b550a1610cef89 Reviewed-on: http://gerrit.cloudera.org:8080/12332 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/parquet/hdfs-parquet-scanner.cc| 2 +- .../queries/QueryTest/mixed-format.test| 14 +++ .../queries/QueryTest/scanners.test| 111 + 3 files changed, 126 insertions(+), 1 deletion(-) diff --git a/be/src/exec/parquet/hdfs-parquet-scanner.cc b/be/src/exec/parquet/hdfs-parquet-scanner.cc index 4fe9914..3836d0b 100644 --- a/be/src/exec/parquet/hdfs-parquet-scanner.cc +++ b/be/src/exec/parquet/hdfs-parquet-scanner.cc @@ -400,7 +400,7 @@ Status HdfsParquetScanner::GetNextInternal(RowBatch* row_batch) { assemble_rows_timer_.Stop(); RETURN_IF_ERROR(status); row_group_rows_read_ += max_tuples; -COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_); +COUNTER_ADD(scan_node_->rows_read_counter(), max_tuples); return Status::OK(); } diff --git a/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test b/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test index 0b693e1..2d5bf9e 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test +++ b/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test @@ -24,3 +24,17 @@ bigint, bigint RESULTS 280,1260 + QUERY +# IMPALA-5861: RowsRead counter should be accurate for table scan that returns +# zero slots. This test is run with various batch_size values, which helps +# reproduce the bug. Scanning multiple file formats triggers the bug because +# the Parquet count(*) rewrite is disabled when non-Parquet file formats are +# present. +select count(*) from functional.alltypesmixedformat + TYPES +bigint + RESULTS +1200 + RUNTIME_PROFILE +aggregation(SUM, RowsRead): 1200 + diff --git a/testdata/workloads/functional-query/queries/QueryTest/scanners.test b/testdata/workloads/functional-query/queries/QueryTest/scanners.test index b05786e..72d6505 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/scanners.test +++ b/testdata/workloads/functional-query/queries/QueryTest/scanners.test @@ -128,3 +128,114 @@ select count(*) from alltypessmall TYPES BIGINT + QUERY +# IMPALA-5861: RowsRead counter should be accurate for table scan that materializes +# zero slots from this files. This test is run with various batch_size values, +# which helps reproduce the Parquet bug. +select 1 from alltypessmall + TYPES +tinyint + RESULTS +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 + RUNTIME_PROFILE +aggregation(SUM, RowsRead): 100 +
[impala] branch master updated: IMPALA-8064: Improve observability of wait times for runtime filters
This is an automated email from the ASF dual-hosted git repository. boroknagyz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new e0aabdd IMPALA-8064: Improve observability of wait times for runtime filters e0aabdd is described below commit e0aabddd573c204a780d3f5ff0af442cdb26b7c6 Author: poojanilangekar AuthorDate: Thu Feb 7 17:00:34 2019 -0800 IMPALA-8064: Improve observability of wait times for runtime filters This change is a diagnostic fix to improve the wait times logged for runtime filters. The filter wait time counts against the elapsed time since the filter's registration in ScanNode::Init() while the duration logged in ScanNode::WaitForRuntimeFilters() is the time spent in the function waiting for all the filters to arrive. This could be misleading as it doesn't account for the elapsed time spent between ScanNode::Init() and ScanNode::WaitForRuntimeFilters(). This change logs the maximum arrival delay for any runtime filter to arrive. From my analysis of the logs of the failed tests, I believe the filters are actually waiting for the specified time but logging the duration incorrectly. The solution would be to increase the wait time further. This change would help validate this hypothesis. Change-Id: I28fd45e75c773bc01d424f5a179ae186ee9b7469 Reviewed-on: http://gerrit.cloudera.org:8080/12401 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/scan-node.cc | 13 + be/src/runtime/runtime-filter-bank.cc | 7 +++ be/src/runtime/runtime-filter.h | 9 + 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/be/src/exec/scan-node.cc b/be/src/exec/scan-node.cc index 906af66..039836c 100644 --- a/be/src/exec/scan-node.cc +++ b/be/src/exec/scan-node.cc @@ -168,6 +168,7 @@ bool ScanNode::WaitForRuntimeFilters() { } vector arrived_filter_ids; vector missing_filter_ids; + int32_t max_arrival_delay = 0; int64_t start = MonotonicMillis(); for (auto& ctx: filter_ctxs_) { string filter_id = Substitute("$0", ctx.filter->id()); @@ -176,20 +177,24 @@ bool ScanNode::WaitForRuntimeFilters() { } else { missing_filter_ids.push_back(filter_id); } +max_arrival_delay = max(max_arrival_delay, ctx.filter->arrival_delay_ms()); } int64_t end = MonotonicMillis(); const string& wait_time = PrettyPrinter::Print(end - start, TUnit::TIME_MS); + const string& arrival_delay = PrettyPrinter::Print(max_arrival_delay, TUnit::TIME_MS); if (arrived_filter_ids.size() == filter_ctxs_.size()) { runtime_profile()->AddInfoString("Runtime filters", -Substitute("All filters arrived. Waited $0", wait_time)); +Substitute("All filters arrived. Waited $0. Maximum arrival delay: $1.", + wait_time, arrival_delay)); VLOG(2) << "Filters arrived. Waited " << wait_time; return true; } - const string& filter_str = Substitute( - "Not all filters arrived (arrived: [$0], missing [$1]), waited for $2", - join(arrived_filter_ids, ", "), join(missing_filter_ids, ", "), wait_time); + const string& filter_str = Substitute("Not all filters arrived (arrived: [$0], missing " +"[$1]), waited for $2. Arrival delay: $3.", + join(arrived_filter_ids, ", "), join(missing_filter_ids, ", "), wait_time, + arrival_delay); runtime_profile()->AddInfoString("Runtime filters", filter_str); VLOG(2) << filter_str; return false; diff --git a/be/src/runtime/runtime-filter-bank.cc b/be/src/runtime/runtime-filter-bank.cc index 85c9625..f8667bc 100644 --- a/be/src/runtime/runtime-filter-bank.cc +++ b/be/src/runtime/runtime-filter-bank.cc @@ -146,9 +146,8 @@ void RuntimeFilterBank::UpdateFilterFromLocal( filter = it->second; } filter->SetFilter(bloom_filter, min_max_filter); -state_->runtime_profile()->AddInfoString( -Substitute("Filter $0 arrival", filter_id), -PrettyPrinter::Print(filter->arrival_delay(), TUnit::TIME_MS)); +state_->runtime_profile()->AddInfoString(Substitute("Filter $0 arrival", filter_id), +PrettyPrinter::Print(filter->arrival_delay_ms(), TUnit::TIME_MS)); } if (has_remote_target @@ -211,7 +210,7 @@ void RuntimeFilterBank::PublishGlobalFilter(const TPublishFilterParams& params) it->second->SetFilter(bloom_filter, min_max_filter); state_->runtime_profile()->AddInfoString( Substitute("Filter $0 arrival", params.filter_id), - PrettyPrinter::Print(it->second->arrival_delay(), TUnit::TIME_MS)); + PrettyPrinter::Print(it->second->arrival_delay_ms(), TUnit::TIME_MS)); } BloomFilter* RuntimeFilterBank::AllocateScratchBloomFilter(int32_t filter_id) { diff --git
[impala] branch 2.x updated (2106127 -> 22fb381)
This is an automated email from the ASF dual-hosted git repository. boroknagyz pushed a change to branch 2.x in repository https://gitbox.apache.org/repos/asf/impala.git. from 2106127 IMPALA-7144: Re-enable TestDescribeTableResults new 55da35e IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, Partition new 238194b IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x new 22fb381 IMPALA-6035: Add query options to limit thread reservation The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/scheduling/admission-controller.cc | 68 ++--- be/src/scheduling/query-schedule.h | 14 +- be/src/scheduling/scheduler.cc | 1 + be/src/service/query-options-test.cc | 2 + be/src/service/query-options.cc| 20 +++ be/src/service/query-options.h | 6 +- bin/bootstrap_system.sh| 2 +- common/thrift/ImpalaInternalService.thrift | 6 + common/thrift/ImpalaService.thrift | 9 ++ .../AlterTableAddDropRangePartitionStmt.java | 4 +- .../analysis/AlterTableAddPartitionStmt.java | 4 +- .../analysis/AlterTableAddReplaceColsStmt.java | 4 +- .../impala/analysis/AlterTableAlterColStmt.java| 5 +- .../impala/analysis/AlterTableDropColStmt.java | 5 +- .../analysis/AlterTableDropPartitionStmt.java | 4 +- .../analysis/AlterTableOrViewRenameStmt.java | 6 +- .../impala/analysis/AlterTableSetCachedStmt.java | 9 +- .../analysis/AlterTableSetFileFormatStmt.java | 4 +- .../impala/analysis/AlterTableSetLocationStmt.java | 15 +- .../analysis/AlterTableSetRowFormatStmt.java | 11 +- .../apache/impala/analysis/AlterTableSetStmt.java | 4 +- .../analysis/AlterTableSetTblProperties.java | 8 +- .../impala/analysis/AlterTableSortByStmt.java | 8 +- .../org/apache/impala/analysis/AlterTableStmt.java | 6 +- .../org/apache/impala/analysis/AlterViewStmt.java | 8 +- .../apache/impala/analysis/AnalysisContext.java| 10 +- .../java/org/apache/impala/analysis/Analyzer.java | 62 .../org/apache/impala/analysis/BaseTableRef.java | 4 +- .../apache/impala/analysis/ColumnLineageGraph.java | 5 +- .../apache/impala/analysis/ComputeStatsStmt.java | 56 .../org/apache/impala/analysis/CreateDbStmt.java | 4 +- .../impala/analysis/CreateFunctionStmtBase.java| 4 +- .../impala/analysis/CreateTableAsSelectStmt.java | 19 +-- .../impala/analysis/CreateTableLikeStmt.java | 4 +- .../apache/impala/analysis/DescribeTableStmt.java | 7 +- .../apache/impala/analysis/DescriptorTable.java| 26 ++-- .../org/apache/impala/analysis/DropDbStmt.java | 4 +- .../apache/impala/analysis/DropFunctionStmt.java | 4 +- .../impala/analysis/DropTableOrViewStmt.java | 10 +- .../apache/impala/analysis/FunctionCallExpr.java | 3 +- .../org/apache/impala/analysis/InlineViewRef.java | 6 +- .../org/apache/impala/analysis/InsertStmt.java | 13 +- .../apache/impala/analysis/IsNullPredicate.java| 4 +- .../org/apache/impala/analysis/LoadDataStmt.java | 8 +- .../org/apache/impala/analysis/ModifyStmt.java | 4 +- .../org/apache/impala/analysis/PartitionDef.java | 10 +- .../org/apache/impala/analysis/PartitionSet.java | 12 +- .../apache/impala/analysis/PartitionSpecBase.java | 4 +- .../main/java/org/apache/impala/analysis/Path.java | 10 +- .../org/apache/impala/analysis/PrivilegeSpec.java | 12 +- .../org/apache/impala/analysis/SelectStmt.java | 4 +- .../impala/analysis/ShowCreateFunctionStmt.java| 4 +- .../impala/analysis/ShowCreateTableStmt.java | 10 +- .../org/apache/impala/analysis/ShowFilesStmt.java | 8 +- .../org/apache/impala/analysis/ShowStatsStmt.java | 12 +- .../java/org/apache/impala/analysis/SlotRef.java | 4 +- .../apache/impala/analysis/StmtMetadataLoader.java | 34 ++--- .../java/org/apache/impala/analysis/TableDef.java | 8 +- .../java/org/apache/impala/analysis/TableRef.java | 4 +- .../org/apache/impala/analysis/ToSqlUtils.java | 9 +- .../org/apache/impala/analysis/TruncateStmt.java | 8 +- .../apache/impala/analysis/TupleDescriptor.java| 8 +- .../org/apache/impala/analysis/WithClause.java | 5 +- .../java/org/apache/impala/catalog/Catalog.java| 1 - fe/src/main/java/org/apache/impala/catalog/Db.java | 27 ++-- .../java/org/apache/impala/catalog/FeCatalog.java | 119 +++ .../main/java/org/apache/impala/catalog/FeDb.java | 100 + .../org/apache/impala/catalog/FeFsPartition.java | 155
[impala] 03/03: IMPALA-6035: Add query options to limit thread reservation
This is an automated email from the ASF dual-hosted git repository. boroknagyz pushed a commit to branch 2.x in repository https://gitbox.apache.org/repos/asf/impala.git commit 22fb381503c713cbbe431fa059968b5c1dab9ec5 Author: Tim Armstrong AuthorDate: Thu May 31 16:25:26 2018 -0700 IMPALA-6035: Add query options to limit thread reservation Adds two options: THREAD_RESERVATION_LIMIT and THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission control based on planner resource requirements and the schedule. The mechanism used is the same as the minimum reservation checks. THREAD_RESERVATION_LIMIT limits the total number of reserved threads in fragments scheduled on a single backend. THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads across all fragments. This also slightly improves the minimum reservation error message to include the host name. Testing: Added end-to-end tests that exercise the code paths. Ran core tests. Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 Reviewed-on: http://gerrit.cloudera.org:8080/10365 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/12429 Reviewed-by: Quanlong Huang --- be/src/scheduling/admission-controller.cc | 68 ++ be/src/scheduling/query-schedule.h | 14 ++- be/src/scheduling/scheduler.cc | 1 + be/src/service/query-options-test.cc | 2 + be/src/service/query-options.cc| 20 be/src/service/query-options.h | 6 +- common/thrift/ImpalaInternalService.thrift | 6 ++ common/thrift/ImpalaService.thrift | 9 ++ .../admission-reject-min-reservation.test | 5 +- .../queries/QueryTest/runtime_row_filters.test | 8 +- .../queries/QueryTest/thread-limits.test | 104 + tests/query_test/test_resource_limits.py | 40 12 files changed, 254 insertions(+), 29 deletions(-) diff --git a/be/src/scheduling/admission-controller.cc b/be/src/scheduling/admission-controller.cc index ce6a82c..f94d454 100644 --- a/be/src/scheduling/admission-controller.cc +++ b/be/src/scheduling/admission-controller.cc @@ -120,8 +120,8 @@ const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION = "plan. See the query profile for more information about the per-node memory " "requirements."; const string REASON_BUFFER_LIMIT_TOO_LOW_FOR_RESERVATION = -"minimum memory reservation is greater than memory available to the query " -"for buffer reservations. Increase the buffer_pool_limit to $0. See the query " +"minimum memory reservation on backend '$0' is greater than memory available to the " +"query for buffer reservations. Increase the buffer_pool_limit to $1. See the query " "profile for more information about the per-node memory requirements."; const string REASON_MIN_RESERVATION_OVER_POOL_MEM = "minimum memory reservation needed is greater than pool max mem resources. Pool " @@ -140,6 +140,12 @@ const string REASON_REQ_OVER_POOL_MEM = const string REASON_REQ_OVER_NODE_MEM = "request memory needed $0 per node is greater than process mem limit $1 of $2.\n\n" "Use the MEM_LIMIT query option to indicate how much memory is required per node."; +const string REASON_THREAD_RESERVATION_LIMIT_EXCEEDED = +"thread reservation on backend '$0' is greater than the THREAD_RESERVATION_LIMIT " +"query option value: $1 > $2."; +const string REASON_THREAD_RESERVATION_AGG_LIMIT_EXCEEDED = +"sum of thread reservations across all $0 backends is greater than the " +"THREAD_RESERVATION_AGGREGATE_LIMIT query option value: $1 > $2."; // Queue decision details // $0 = num running queries, $1 = num queries limit @@ -406,17 +412,24 @@ bool AdmissionController::RejectImmediately(const QuerySchedule& schedule, // the checks isn't particularly important, though some thought was given to ordering // them in a way that might make the sense for a user. - // Compute the max (over all backends) min_mem_reservation_bytes, the cluster total - // (across all backends) min_mem_reservation_bytes and the min (over all backends) + // Compute the max (over all backends) and cluster total (across all backends) for + // min_mem_reservation_bytes and thread_reservation and the min (over all backends) // min_proc_mem_limit. - int64_t max_min_mem_reservation_bytes = -1; + pair largest_min_mem_reservation(nullptr, -1); int64_t cluster_min_mem_reservation_bytes = 0; + pair max_thread_reservation(nullptr, 0); pair min_proc_mem_limit( nullptr, std::numeric_limits::max()); + int64_t cluster_thread_reservation = 0; for (const auto& e : schedule.per_backend_exec_params()) { cluster_min_mem_reservation_bytes +=
[impala] 02/03: IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x
This is an automated email from the ASF dual-hosted git repository. boroknagyz pushed a commit to branch 2.x in repository https://gitbox.apache.org/repos/asf/impala.git commit 238194b2343619fe0cc8d0d94b475a5f2b0e2f82 Author: stiga-huang AuthorDate: Fri Feb 1 17:45:19 2019 -0800 IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x The new commit in Impala-lzo/master breaks the builds of Impala-2.x. We should depend on a dedicated branch for 2.x. Change-Id: I67591c7cfc4bede5c096a49beb57da34bb697338 Reviewed-on: http://gerrit.cloudera.org:8080/12339 Tested-by: Impala Public Jenkins Reviewed-by: Fredy Wijaya --- bin/bootstrap_system.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bin/bootstrap_system.sh b/bin/bootstrap_system.sh index bce161b..be79392 100755 --- a/bin/bootstrap_system.sh +++ b/bin/bootstrap_system.sh @@ -206,7 +206,7 @@ echo ">>> Checking out Impala-lzo" : ${IMPALA_LZO_HOME:="${IMPALA_HOME}/../Impala-lzo"} if ! [[ -d "$IMPALA_LZO_HOME" ]] then - git clone https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME" + git clone --branch 2.x https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME" fi echo ">>> Checking out and building hadoop-lzo"