date:20190212

[impala] 03/04: Turn off shell debug tracing for create-load-data.sh

2019-02-12 Thread tmarshall

This is an automated email from the ASF dual-hosted git repository.

tmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6938831ae98f3e9cdb63ecbf63a608c17bdc0b6b
Author: Joe McDonnell 
AuthorDate: Thu Feb 7 14:48:07 2019 -0800

Turn off shell debug tracing for create-load-data.sh

This removes a "set -x" from testdata/bin/create-load-data.sh.

Change-Id: I524ec48d0264f6180a13d6d068832809bcc86596
Reviewed-on: http://gerrit.cloudera.org:8080/12398
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 testdata/bin/create-load-data.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index a32d44c..44452ab 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -31,7 +31,6 @@
 set -euo pipefail
 . $IMPALA_HOME/bin/report_build_error.sh
 setup_report_build_error
-set -x
 
 . ${IMPALA_HOME}/bin/impala-config.sh > /dev/null 2>&1
 . ${IMPALA_HOME}/testdata/bin/run-step.sh

[impala] 01/04: IMPALA-7214: [DOCS] More on decoupling impala and DataNodes

2019-02-12 Thread tmarshall

This is an automated email from the ASF dual-hosted git repository.

tmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5b32a0d60110be7c21184819c2dffbb7cbff750f
Author: Alex Rodoni 
AuthorDate: Tue Feb 12 12:40:42 2019 -0800

IMPALA-7214: [DOCS] More on decoupling impala and DataNodes

Change-Id: I4b6f1c704c1e328af9f0beec73f8b6b61fba992e
Reviewed-on: http://gerrit.cloudera.org:8080/12457
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
 docs/topics/impala_processes.xml   | 10 +++--
 docs/topics/impala_troubleshooting.xml | 39 +-
 2 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/docs/topics/impala_processes.xml b/docs/topics/impala_processes.xml
index 71986d3..70366dd 100644
--- a/docs/topics/impala_processes.xml
+++ b/docs/topics/impala_processes.xml
@@ -55,10 +55,7 @@ under the License.
 Start one instance of the Impala catalog service.
   
 
-  
-Start the main Impala service on one or more DataNodes, ideally on all 
DataNodes to maximize local
-processing and avoid network traffic due to remote reads.
-  
+   Start the main Impala daemon services. 
 
 
 
@@ -101,9 +98,8 @@ under the License.
 
 $ sudo service impala-catalog start
 
-  
-Start the Impala service on each DataNode using a command similar to 
the following:
-  
+   Start the Impala daemon services using a command similar to the
+following: 
 
   
 $ sudo service impala-server start
diff --git a/docs/topics/impala_troubleshooting.xml 
b/docs/topics/impala_troubleshooting.xml
index 250c899..80b7363 100644
--- a/docs/topics/impala_troubleshooting.xml
+++ b/docs/topics/impala_troubleshooting.xml
@@ -123,17 +123,17 @@ terminate called after throwing an instance of 
'boost::exception_detail::clone_i
   
 Troubleshooting I/O Capacity Problems
 
-  
-Impala queries are typically I/O-intensive. If there is an I/O problem 
with storage devices,
-or with HDFS itself, Impala queries could show slow response times 
with no obvious cause
-on the Impala side. Slow I/O on even a single DataNode could result in 
an overall slowdown, because
-queries involving clauses such as ORDER BY, 
GROUP BY, or JOIN
-do not start returning results until all DataNodes have finished their 
work.
-  
-  
-To test whether the Linux I/O system itself is performing as expected, 
run Linux commands like
-the following on each DataNode:
-  
+   Impala queries are typically I/O-intensive. If there is an I/O 
problem
+with storage devices, or with HDFS itself, Impala queries could show
+slow response times with no obvious cause on the Impala side. Slow I/O
+on even a single Impala daemon could result in an overall slowdown,
+because queries involving clauses such as ORDER BY,
+  GROUP BY, or JOIN do not start
+returning results until all executor Impala daemons have finished their
+work. 
+   To test whether the Linux I/O system itself is performing as 
expected,
+run Linux commands like the following on each host Impala daemon is
+running: 
 
 $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
 vm.drop_caches = 3
@@ -265,14 +265,15 @@ $ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
 
 
 
-  
-Replace hostname and 
port with the hostname and port of
-your Impala state store host machine and web server port. 
The default port is 25010.
-  
-  The number of impalad instances listed 
should match the expected number of
-  impalad instances installed in the cluster. 
There should also be one
-  impalad instance installed on each DataNode
-
+   Replace hostname and
+  port with the hostname and port of 
your
+Impala state store host machine and web server port. The
+default port is 25010.  The number of
+impalad instances listed should match the
+  expected number of impalad instances
+  installed in the cluster. There should also be one
+impalad instance installed on each
+  DataNode.

[impala] 04/04: IMPALA-8183: fix test_reportexecstatus_retry flakiness

2019-02-12 Thread tmarshall

This is an automated email from the ASF dual-hosted git repository.

tmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 9492d451d5d5a82bfc6f4c93c3a0c6e6d0cc4981
Author: Thomas Tauber-Marshall 
AuthorDate: Tue Feb 12 22:47:52 2019 +

IMPALA-8183: fix test_reportexecstatus_retry flakiness

The test is designed to cause ReportExecStatus() rpcs to fail by
backing up the control service queue. Previously, after a failed
ReportExecStatus() we would wait 'report_status_retry_interval_ms'
between retries, which was 100ms by default and wasn't touched by the
test. That 100ms was right on the edge of being enough time for the
coordinator to keep up with processing the reports, so that some would
fail but most would succeed. It was always possible that we could hit
IMPALA-2990 in this setup, but it was unlikely.

Now, with IMPALA-4555 'report_status_retry_interval_ms' was removed
and we instead wait 'status_report_interval_ms' between retries. By
default, this is 5000ms, so it should give the coordinator even more
time and make these issues less likely. However, the test sets
'status_report_interval_ms' to 10ms, which isn't nearly enough time
for the coordinator to do its processing, causing lots of the
ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty
often.

The solution is to set 'status_report_interval_ms' to 100ms in the
test, which roughly achieves the same retry frequency as before. The
same change is made to a similar test test_reportexecstatus_timeout.

Testing:
- Ran test_reportexecstatus_retry in a loop 400 times without seeing a
  failure. It previously repro-ed for me about once per 50 runs.
- Manually verified that both tests are still hitting the error paths
  that they are supposed to be testing.

Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Reviewed-on: http://gerrit.cloudera.org:8080/12461
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 tests/custom_cluster/test_rpc_timeout.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/custom_cluster/test_rpc_timeout.py 
b/tests/custom_cluster/test_rpc_timeout.py
index d007ef4..e1a959c 100644
--- a/tests/custom_cluster/test_rpc_timeout.py
+++ b/tests/custom_cluster/test_rpc_timeout.py
@@ -128,7 +128,7 @@ class TestRPCTimeout(CustomClusterTestSuite):
 
   # Inject jitter into the RPC handler of ReportExecStatus() to trigger RPC 
timeout.
   @pytest.mark.execute_serially
-  @CustomClusterTestSuite.with_args("--status_report_interval_ms=10"
+  @CustomClusterTestSuite.with_args("--status_report_interval_ms=100"
   " --backend_client_rpc_timeout_ms=1000")
   def test_reportexecstatus_timeout(self, vector):
 query_options = {'debug_action': 
'REPORT_EXEC_STATUS_DELAY:JITTER@1500@0.5'}
@@ -137,7 +137,7 @@ class TestRPCTimeout(CustomClusterTestSuite):
   # Use a small service queue memory limit and a single service thread to 
exercise
   # the retry paths in the ReportExecStatus() RPC
   @pytest.mark.execute_serially
-  @CustomClusterTestSuite.with_args("--status_report_interval_ms=10"
+  @CustomClusterTestSuite.with_args("--status_report_interval_ms=100"
   " --control_service_queue_mem_limit=1 
--control_service_num_svc_threads=1")
   def test_reportexecstatus_retry(self, vector):
 self.execute_query_verify_metrics(self.TEST_QUERY, None, 10)

[impala] 02/04: Add support for compiling using OpenSSL 1.1

2019-02-12 Thread tmarshall

This is an automated email from the ASF dual-hosted git repository.

tmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d2b8b7b9b0f3a02e2418d9182007b736bb739a1b
Author: Hector Acosta 
AuthorDate: Fri Feb 8 14:50:17 2019 -0800

Add support for compiling using OpenSSL 1.1

Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264
Reviewed-on: http://gerrit.cloudera.org:8080/12420
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/util/openssl-util.cc | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/be/src/util/openssl-util.cc b/be/src/util/openssl-util.cc
index 2b66b86..da583cf 100644
--- a/be/src/util/openssl-util.cc
+++ b/be/src/util/openssl-util.cc
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "common/atomic.h"
 #include "gutil/port.h" // ATTRIBUTE_WEAK
@@ -70,7 +71,13 @@ static const int RNG_RESEED_INTERVAL = 128;
 static const int RNG_RESEED_BYTES = 512;
 
 int MaxSupportedTlsVersion() {
+#if OPENSSL_VERSION_NUMBER < 0x1010L
   return SSLv23_method()->version;
+#else
+  // OpenSSL 1.1+ doesn't let us detect the supported TLS version at runtime. 
Assume
+  // that the OpenSSL library we're linked against supports only up to TLS1.2
+  return TLS1_2_VERSION;
+#endif
 }
 
 bool IsInternalTlsConfigured() {
@@ -97,13 +104,25 @@ struct ScopedEVPCipherCtx {
   DISALLOW_COPY_AND_ASSIGN(ScopedEVPCipherCtx);
 
   explicit ScopedEVPCipherCtx(int padding) {
-EVP_CIPHER_CTX_init();
-EVP_CIPHER_CTX_set_padding(, padding);
+#if OPENSSL_VERSION_NUMBER < 0x1010L
+ctx = static_cast(malloc(sizeof(*ctx)));
+EVP_CIPHER_CTX_init(ctx);
+#else
+ctx = EVP_CIPHER_CTX_new();
+#endif
+EVP_CIPHER_CTX_set_padding(ctx, padding);
   }
 
-  ~ScopedEVPCipherCtx() { EVP_CIPHER_CTX_cleanup(); }
+  ~ScopedEVPCipherCtx() {
+#if OPENSSL_VERSION_NUMBER < 0x1010L
+EVP_CIPHER_CTX_cleanup(ctx);
+free(ctx);
+#else
+EVP_CIPHER_CTX_free(ctx);
+#endif
+  }
 
-  EVP_CIPHER_CTX ctx;
+  EVP_CIPHER_CTX* ctx;
 };
 
 // Callback used by OpenSSLErr() - write the error given to us through buf to 
the
@@ -170,13 +189,13 @@ Status EncryptionKey::EncryptInternal(
   // mode is well-optimized(instruction level parallelism) with hardware 
acceleration
   // on x86 and PowerPC
   const EVP_CIPHER* evpCipher = GetCipher();
-  int success = encrypt ? EVP_EncryptInit_ex(, evpCipher, NULL, key_, 
iv_) :
-  EVP_DecryptInit_ex(, evpCipher, NULL, key_, 
iv_);
+  int success = encrypt ? EVP_EncryptInit_ex(ctx.ctx, evpCipher, NULL, key_, 
iv_) :
+  EVP_DecryptInit_ex(ctx.ctx, evpCipher, NULL, key_, 
iv_);
   if (success != 1) {
 return OpenSSLErr(encrypt ? "EVP_EncryptInit_ex" : "EVP_DecryptInit_ex", 
err_context);
   }
   if (IsGcmMode()) {
-if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_SET_IVLEN, AES_BLOCK_SIZE, 
NULL)
+if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_SET_IVLEN, AES_BLOCK_SIZE, 
NULL)
 != 1) {
   return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context);
 }
@@ -189,8 +208,8 @@ Status EncryptionKey::EncryptInternal(
 int in_len = static_cast(min(len - offset, 
numeric_limits::max()));
 int out_len;
 success = encrypt ?
-EVP_EncryptUpdate(, out + offset, _len, data + offset, 
in_len) :
-EVP_DecryptUpdate(, out + offset, _len, data + offset, 
in_len);
+EVP_EncryptUpdate(ctx.ctx, out + offset, _len, data + offset, 
in_len) :
+EVP_DecryptUpdate(ctx.ctx, out + offset, _len, data + offset, 
in_len);
 if (success != 1) {
   return OpenSSLErr(encrypt ? "EVP_EncryptUpdate" : "EVP_DecryptUpdate", 
err_context);
 }
@@ -201,7 +220,7 @@ Status EncryptionKey::EncryptInternal(
 
   if (IsGcmMode() && !encrypt) {
 // Set expected tag value
-if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_SET_TAG, AES_BLOCK_SIZE, 
gcm_tag_)
+if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_SET_TAG, AES_BLOCK_SIZE, 
gcm_tag_)
 != 1) {
   return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context);
 }
@@ -209,14 +228,14 @@ Status EncryptionKey::EncryptInternal(
 
   // Finalize encryption or decryption.
   int final_out_len;
-  success = encrypt ? EVP_EncryptFinal_ex(, out + offset, 
_out_len) :
-  EVP_DecryptFinal_ex(, out + offset, 
_out_len);
+  success = encrypt ? EVP_EncryptFinal_ex(ctx.ctx, out + offset, 
_out_len) :
+  EVP_DecryptFinal_ex(ctx.ctx, out + offset, 
_out_len);
   if (success != 1) {
 return OpenSSLErr(encrypt ? "EVP_EncryptFinal" : "EVP_DecryptFinal", 
err_context);
   }
 
   if (IsGcmMode() && encrypt) {
-if (EVP_CIPHER_CTX_ctrl(, EVP_CTRL_GCM_GET_TAG, AES_BLOCK_SIZE, 
gcm_tag_)
+if (EVP_CIPHER_CTX_ctrl(ctx.ctx, EVP_CTRL_GCM_GET_TAG, AES_BLOCK_SIZE, 
gcm_tag_)
 != 1) {
   return OpenSSLErr("EVP_CIPHER_CTX_ctrl", err_context);

[impala] branch master updated: IMPALA-8186: script to configure docker network

2019-02-12 Thread tarmstrong

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new dbe9fef  IMPALA-8186: script to configure docker network
dbe9fef is described below

commit dbe9fefa05ce9738865349e9130004457ff31c62
Author: Tim Armstrong 
AuthorDate: Mon Feb 11 16:33:05 2019 -0800

IMPALA-8186: script to configure docker network

This automates the network setup that I did manually
in http://gerrit.cloudera.org:8080/12189

After running the script it should be possible to
run "./buildall.sh -format -testdata" to load
test data with the right hostnames, then
"start-impala-cluster.py --docker_network=network-name"
to run a dockerised minicluster.

Change-Id: Icb4854aa951bcad7087a9653845b22ffd862057d
Reviewed-on: http://gerrit.cloudera.org:8080/12452
Reviewed-by: Philip Zeyliger 
Tested-by: Tim Armstrong 
---
 docker/configure_test_network.sh | 53 
 1 file changed, 53 insertions(+)

diff --git a/docker/configure_test_network.sh b/docker/configure_test_network.sh
new file mode 100755
index 000..df5567b
--- /dev/null
+++ b/docker/configure_test_network.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# Sets up a Docker bridge network with the name provided by the first argument 
and
+# appends the configuration required to use it in a dockerised minicluster to
+# bin/impala-config-local.sh. Note that impala-config.sh needs to be 
re-sourced,
+# cluster configurations need to be regenerated, all minicluster processes 
restarted,
+# and data reloaded for the change to be effective and your cluster to be 
functional.
+
+set -euo pipefail
+
+usage() {
+  echo "configure_test_network.sh "
+}
+
+if [[ $# != 1 ]]; then
+  usage
+  exit 1
+fi
+
+NETWORK_NAME=$1
+
+# Remove existing network if present.
+echo "Removing existing network '$NETWORK_NAME'"
+docker network rm "$NETWORK_NAME" || true
+
+echo "Create network '$NETWORK_NAME'"
+docker network create -d bridge $NETWORK_NAME
+GATEWAY=$(docker network inspect "$NETWORK_NAME" -f '{{(index .IPAM.Config 
0).Gateway}}')
+echo "Gateway is '${GATEWAY}'"
+
+echo "Updating impala-config-local.sh"
+echo "# Configuration to use docker network ${NETWORK_NAME}" \
+  >> "$IMPALA_HOME"/bin/impala-config-local.sh
+echo "export INTERNAL_LISTEN_HOST=${GATEWAY}" >> 
"$IMPALA_HOME"/bin/impala-config-local.sh
+echo "export DEFAULT_FS=hdfs://\${INTERNAL_LISTEN_HOST}:20500" \
+  >> "$IMPALA_HOME"/bin/impala-config-local.sh

[impala] branch master updated: IMPALA-5861: fix RowsRead for zero-slot table scan

2019-02-12 Thread tarmstrong

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new a154b2d  IMPALA-5861: fix RowsRead for zero-slot table scan
a154b2d is described below

commit a154b2d6e775a508df4fd2c8d51a18d5c1d1f933
Author: Tim Armstrong 
AuthorDate: Fri Feb 1 07:13:56 2019 -0800

IMPALA-5861: fix RowsRead for zero-slot table scan

Testing:
Added regression test based on JIRA and a targeted
test for all HDFS file formats.

Change-Id: I7a927c6a4f0b8055608cb7a5e2b550a1610cef89
Reviewed-on: http://gerrit.cloudera.org:8080/12332
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/parquet/hdfs-parquet-scanner.cc|   2 +-
 .../queries/QueryTest/mixed-format.test|  14 +++
 .../queries/QueryTest/scanners.test| 111 +
 3 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/be/src/exec/parquet/hdfs-parquet-scanner.cc 
b/be/src/exec/parquet/hdfs-parquet-scanner.cc
index 4fe9914..3836d0b 100644
--- a/be/src/exec/parquet/hdfs-parquet-scanner.cc
+++ b/be/src/exec/parquet/hdfs-parquet-scanner.cc
@@ -400,7 +400,7 @@ Status HdfsParquetScanner::GetNextInternal(RowBatch* 
row_batch) {
 assemble_rows_timer_.Stop();
 RETURN_IF_ERROR(status);
 row_group_rows_read_ += max_tuples;
-COUNTER_ADD(scan_node_->rows_read_counter(), row_group_rows_read_);
+COUNTER_ADD(scan_node_->rows_read_counter(), max_tuples);
 return Status::OK();
   }
 
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test 
b/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
index 0b693e1..2d5bf9e 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
@@ -24,3 +24,17 @@ bigint, bigint
  RESULTS
 280,1260
 
+ QUERY
+# IMPALA-5861: RowsRead counter should be accurate for table scan that returns
+# zero slots. This test is run with various batch_size values, which helps
+# reproduce the bug. Scanning multiple file formats triggers the bug because
+# the Parquet count(*) rewrite is disabled when non-Parquet file formats are
+# present.
+select count(*) from functional.alltypesmixedformat
+ TYPES
+bigint
+ RESULTS
+1200
+ RUNTIME_PROFILE
+aggregation(SUM, RowsRead): 1200
+
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/scanners.test 
b/testdata/workloads/functional-query/queries/QueryTest/scanners.test
index b05786e..72d6505 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/scanners.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/scanners.test
@@ -128,3 +128,114 @@ select count(*) from alltypessmall
  TYPES
 BIGINT
 
+ QUERY
+# IMPALA-5861: RowsRead counter should be accurate for table scan that 
materializes
+# zero slots from this files. This test is run with various batch_size values,
+# which helps reproduce the Parquet bug.
+select 1 from alltypessmall
+ TYPES
+tinyint
+ RESULTS
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+ RUNTIME_PROFILE
+aggregation(SUM, RowsRead): 100
+

[impala] branch master updated: IMPALA-8064: Improve observability of wait times for runtime filters

2019-02-12 Thread boroknagyz

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new e0aabdd  IMPALA-8064: Improve observability of wait times for runtime 
filters
e0aabdd is described below

commit e0aabddd573c204a780d3f5ff0af442cdb26b7c6
Author: poojanilangekar 
AuthorDate: Thu Feb 7 17:00:34 2019 -0800

IMPALA-8064: Improve observability of wait times for runtime filters

This change is a diagnostic fix to improve the wait times logged
for runtime filters. The filter wait time counts against the
elapsed time since the filter's registration in ScanNode::Init()
while the duration logged in ScanNode::WaitForRuntimeFilters() is
the time spent in the function waiting for all the filters to
arrive. This could be misleading as it doesn't account for the
elapsed time spent between ScanNode::Init() and
ScanNode::WaitForRuntimeFilters(). This change logs the maximum
arrival delay for any runtime filter to arrive.

From my analysis of the logs of the failed tests, I believe the
filters are actually waiting for the specified time but logging
the duration incorrectly. The solution would be to increase the
wait time further. This change would help validate this
hypothesis.

Change-Id: I28fd45e75c773bc01d424f5a179ae186ee9b7469
Reviewed-on: http://gerrit.cloudera.org:8080/12401
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/scan-node.cc  | 13 +
 be/src/runtime/runtime-filter-bank.cc |  7 +++
 be/src/runtime/runtime-filter.h   |  9 +
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/be/src/exec/scan-node.cc b/be/src/exec/scan-node.cc
index 906af66..039836c 100644
--- a/be/src/exec/scan-node.cc
+++ b/be/src/exec/scan-node.cc
@@ -168,6 +168,7 @@ bool ScanNode::WaitForRuntimeFilters() {
   }
   vector arrived_filter_ids;
   vector missing_filter_ids;
+  int32_t max_arrival_delay = 0;
   int64_t start = MonotonicMillis();
   for (auto& ctx: filter_ctxs_) {
 string filter_id = Substitute("$0", ctx.filter->id());
@@ -176,20 +177,24 @@ bool ScanNode::WaitForRuntimeFilters() {
 } else {
   missing_filter_ids.push_back(filter_id);
 }
+max_arrival_delay = max(max_arrival_delay, ctx.filter->arrival_delay_ms());
   }
   int64_t end = MonotonicMillis();
   const string& wait_time = PrettyPrinter::Print(end - start, TUnit::TIME_MS);
+  const string& arrival_delay = PrettyPrinter::Print(max_arrival_delay, 
TUnit::TIME_MS);
 
   if (arrived_filter_ids.size() == filter_ctxs_.size()) {
 runtime_profile()->AddInfoString("Runtime filters",
-Substitute("All filters arrived. Waited $0", wait_time));
+Substitute("All filters arrived. Waited $0. Maximum arrival delay: 
$1.",
+ wait_time, arrival_delay));
 VLOG(2) << "Filters arrived. Waited " << wait_time;
 return true;
   }
 
-  const string& filter_str = Substitute(
-  "Not all filters arrived (arrived: [$0], missing [$1]), waited for $2",
-  join(arrived_filter_ids, ", "), join(missing_filter_ids, ", "), 
wait_time);
+  const string& filter_str = Substitute("Not all filters arrived (arrived: 
[$0], missing "
+"[$1]), waited for $2. Arrival delay: 
$3.",
+  join(arrived_filter_ids, ", "), join(missing_filter_ids, ", "), 
wait_time,
+  arrival_delay);
   runtime_profile()->AddInfoString("Runtime filters", filter_str);
   VLOG(2) << filter_str;
   return false;
diff --git a/be/src/runtime/runtime-filter-bank.cc 
b/be/src/runtime/runtime-filter-bank.cc
index 85c9625..f8667bc 100644
--- a/be/src/runtime/runtime-filter-bank.cc
+++ b/be/src/runtime/runtime-filter-bank.cc
@@ -146,9 +146,8 @@ void RuntimeFilterBank::UpdateFilterFromLocal(
   filter = it->second;
 }
 filter->SetFilter(bloom_filter, min_max_filter);
-state_->runtime_profile()->AddInfoString(
-Substitute("Filter $0 arrival", filter_id),
-PrettyPrinter::Print(filter->arrival_delay(), TUnit::TIME_MS));
+state_->runtime_profile()->AddInfoString(Substitute("Filter $0 arrival", 
filter_id),
+PrettyPrinter::Print(filter->arrival_delay_ms(), TUnit::TIME_MS));
   }
 
   if (has_remote_target
@@ -211,7 +210,7 @@ void RuntimeFilterBank::PublishGlobalFilter(const 
TPublishFilterParams& params)
   it->second->SetFilter(bloom_filter, min_max_filter);
   state_->runtime_profile()->AddInfoString(
   Substitute("Filter $0 arrival", params.filter_id),
-  PrettyPrinter::Print(it->second->arrival_delay(), TUnit::TIME_MS));
+  PrettyPrinter::Print(it->second->arrival_delay_ms(), TUnit::TIME_MS));
 }
 
 BloomFilter* RuntimeFilterBank::AllocateScratchBloomFilter(int32_t filter_id) {
diff --git

[impala] branch 2.x updated (2106127 -> 22fb381)

2019-02-12 Thread boroknagyz

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a change to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 2106127  IMPALA-7144: Re-enable TestDescribeTableResults
 new 55da35e  IMPALA-7128 (part 1) Refactor interfaces for Db, View, Table, 
Partition
 new 238194b  IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x
 new 22fb381  IMPALA-6035: Add query options to limit thread reservation

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/scheduling/admission-controller.cc  |  68 ++---
 be/src/scheduling/query-schedule.h |  14 +-
 be/src/scheduling/scheduler.cc |   1 +
 be/src/service/query-options-test.cc   |   2 +
 be/src/service/query-options.cc|  20 +++
 be/src/service/query-options.h |   6 +-
 bin/bootstrap_system.sh|   2 +-
 common/thrift/ImpalaInternalService.thrift |   6 +
 common/thrift/ImpalaService.thrift |   9 ++
 .../AlterTableAddDropRangePartitionStmt.java   |   4 +-
 .../analysis/AlterTableAddPartitionStmt.java   |   4 +-
 .../analysis/AlterTableAddReplaceColsStmt.java |   4 +-
 .../impala/analysis/AlterTableAlterColStmt.java|   5 +-
 .../impala/analysis/AlterTableDropColStmt.java |   5 +-
 .../analysis/AlterTableDropPartitionStmt.java  |   4 +-
 .../analysis/AlterTableOrViewRenameStmt.java   |   6 +-
 .../impala/analysis/AlterTableSetCachedStmt.java   |   9 +-
 .../analysis/AlterTableSetFileFormatStmt.java  |   4 +-
 .../impala/analysis/AlterTableSetLocationStmt.java |  15 +-
 .../analysis/AlterTableSetRowFormatStmt.java   |  11 +-
 .../apache/impala/analysis/AlterTableSetStmt.java  |   4 +-
 .../analysis/AlterTableSetTblProperties.java   |   8 +-
 .../impala/analysis/AlterTableSortByStmt.java  |   8 +-
 .../org/apache/impala/analysis/AlterTableStmt.java |   6 +-
 .../org/apache/impala/analysis/AlterViewStmt.java  |   8 +-
 .../apache/impala/analysis/AnalysisContext.java|  10 +-
 .../java/org/apache/impala/analysis/Analyzer.java  |  62 
 .../org/apache/impala/analysis/BaseTableRef.java   |   4 +-
 .../apache/impala/analysis/ColumnLineageGraph.java |   5 +-
 .../apache/impala/analysis/ComputeStatsStmt.java   |  56 
 .../org/apache/impala/analysis/CreateDbStmt.java   |   4 +-
 .../impala/analysis/CreateFunctionStmtBase.java|   4 +-
 .../impala/analysis/CreateTableAsSelectStmt.java   |  19 +--
 .../impala/analysis/CreateTableLikeStmt.java   |   4 +-
 .../apache/impala/analysis/DescribeTableStmt.java  |   7 +-
 .../apache/impala/analysis/DescriptorTable.java|  26 ++--
 .../org/apache/impala/analysis/DropDbStmt.java |   4 +-
 .../apache/impala/analysis/DropFunctionStmt.java   |   4 +-
 .../impala/analysis/DropTableOrViewStmt.java   |  10 +-
 .../apache/impala/analysis/FunctionCallExpr.java   |   3 +-
 .../org/apache/impala/analysis/InlineViewRef.java  |   6 +-
 .../org/apache/impala/analysis/InsertStmt.java |  13 +-
 .../apache/impala/analysis/IsNullPredicate.java|   4 +-
 .../org/apache/impala/analysis/LoadDataStmt.java   |   8 +-
 .../org/apache/impala/analysis/ModifyStmt.java |   4 +-
 .../org/apache/impala/analysis/PartitionDef.java   |  10 +-
 .../org/apache/impala/analysis/PartitionSet.java   |  12 +-
 .../apache/impala/analysis/PartitionSpecBase.java  |   4 +-
 .../main/java/org/apache/impala/analysis/Path.java |  10 +-
 .../org/apache/impala/analysis/PrivilegeSpec.java  |  12 +-
 .../org/apache/impala/analysis/SelectStmt.java |   4 +-
 .../impala/analysis/ShowCreateFunctionStmt.java|   4 +-
 .../impala/analysis/ShowCreateTableStmt.java   |  10 +-
 .../org/apache/impala/analysis/ShowFilesStmt.java  |   8 +-
 .../org/apache/impala/analysis/ShowStatsStmt.java  |  12 +-
 .../java/org/apache/impala/analysis/SlotRef.java   |   4 +-
 .../apache/impala/analysis/StmtMetadataLoader.java |  34 ++---
 .../java/org/apache/impala/analysis/TableDef.java  |   8 +-
 .../java/org/apache/impala/analysis/TableRef.java  |   4 +-
 .../org/apache/impala/analysis/ToSqlUtils.java |   9 +-
 .../org/apache/impala/analysis/TruncateStmt.java   |   8 +-
 .../apache/impala/analysis/TupleDescriptor.java|   8 +-
 .../org/apache/impala/analysis/WithClause.java |   5 +-
 .../java/org/apache/impala/catalog/Catalog.java|   1 -
 fe/src/main/java/org/apache/impala/catalog/Db.java |  27 ++--
 .../java/org/apache/impala/catalog/FeCatalog.java  | 119 +++
 .../main/java/org/apache/impala/catalog/FeDb.java  | 100 +
 .../org/apache/impala/catalog/FeFsPartition.java   | 155

[impala] 03/03: IMPALA-6035: Add query options to limit thread reservation

2019-02-12 Thread boroknagyz

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 22fb381503c713cbbe431fa059968b5c1dab9ec5
Author: Tim Armstrong 
AuthorDate: Thu May 31 16:25:26 2018 -0700

IMPALA-6035: Add query options to limit thread reservation

Adds two options: THREAD_RESERVATION_LIMIT and
THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission
control based on planner resource requirements and the schedule. The
mechanism used is the same as the minimum reservation checks.

THREAD_RESERVATION_LIMIT limits the total number of reserved threads in
fragments scheduled on a single backend.
THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads
across all fragments.

This also slightly improves the minimum reservation error message to
include the host name.

Testing:
Added end-to-end tests that exercise the code paths.

Ran core tests.

Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007
Reviewed-on: http://gerrit.cloudera.org:8080/10365
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/12429
Reviewed-by: Quanlong Huang 
---
 be/src/scheduling/admission-controller.cc  |  68 ++
 be/src/scheduling/query-schedule.h |  14 ++-
 be/src/scheduling/scheduler.cc |   1 +
 be/src/service/query-options-test.cc   |   2 +
 be/src/service/query-options.cc|  20 
 be/src/service/query-options.h |   6 +-
 common/thrift/ImpalaInternalService.thrift |   6 ++
 common/thrift/ImpalaService.thrift |   9 ++
 .../admission-reject-min-reservation.test  |   5 +-
 .../queries/QueryTest/runtime_row_filters.test |   8 +-
 .../queries/QueryTest/thread-limits.test   | 104 +
 tests/query_test/test_resource_limits.py   |  40 
 12 files changed, 254 insertions(+), 29 deletions(-)

diff --git a/be/src/scheduling/admission-controller.cc 
b/be/src/scheduling/admission-controller.cc
index ce6a82c..f94d454 100644
--- a/be/src/scheduling/admission-controller.cc
+++ b/be/src/scheduling/admission-controller.cc
@@ -120,8 +120,8 @@ const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION =
 "plan. See the query profile for more information about the per-node 
memory "
 "requirements.";
 const string REASON_BUFFER_LIMIT_TOO_LOW_FOR_RESERVATION =
-"minimum memory reservation is greater than memory available to the query "
-"for buffer reservations. Increase the buffer_pool_limit to $0. See the 
query "
+"minimum memory reservation on backend '$0' is greater than memory 
available to the "
+"query for buffer reservations. Increase the buffer_pool_limit to $1. See 
the query "
 "profile for more information about the per-node memory requirements.";
 const string REASON_MIN_RESERVATION_OVER_POOL_MEM =
 "minimum memory reservation needed is greater than pool max mem resources. 
Pool "
@@ -140,6 +140,12 @@ const string REASON_REQ_OVER_POOL_MEM =
 const string REASON_REQ_OVER_NODE_MEM =
 "request memory needed $0 per node is greater than process mem limit $1 of 
$2.\n\n"
 "Use the MEM_LIMIT query option to indicate how much memory is required 
per node.";
+const string REASON_THREAD_RESERVATION_LIMIT_EXCEEDED =
+"thread reservation on backend '$0' is greater than the 
THREAD_RESERVATION_LIMIT "
+"query option value: $1 > $2.";
+const string REASON_THREAD_RESERVATION_AGG_LIMIT_EXCEEDED =
+"sum of thread reservations across all $0 backends is greater than the "
+"THREAD_RESERVATION_AGGREGATE_LIMIT query option value: $1 > $2.";
 
 // Queue decision details
 // $0 = num running queries, $1 = num queries limit
@@ -406,17 +412,24 @@ bool AdmissionController::RejectImmediately(const 
QuerySchedule& schedule,
   // the checks isn't particularly important, though some thought was given to 
ordering
   // them in a way that might make the sense for a user.
 
-  // Compute the max (over all backends) min_mem_reservation_bytes, the 
cluster total
-  // (across all backends) min_mem_reservation_bytes and the min (over all 
backends)
+  // Compute the max (over all backends) and cluster total (across all 
backends) for
+  // min_mem_reservation_bytes and thread_reservation and the min (over all 
backends)
   // min_proc_mem_limit.
-  int64_t max_min_mem_reservation_bytes = -1;
+  pair largest_min_mem_reservation(nullptr, 
-1);
   int64_t cluster_min_mem_reservation_bytes = 0;
+  pair max_thread_reservation(nullptr, 0);
   pair min_proc_mem_limit(
   nullptr, std::numeric_limits::max());
+  int64_t cluster_thread_reservation = 0;
   for (const auto& e : schedule.per_backend_exec_params()) {
 cluster_min_mem_reservation_bytes +=

[impala] 02/03: IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x

2019-02-12 Thread boroknagyz

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 238194b2343619fe0cc8d0d94b475a5f2b0e2f82
Author: stiga-huang 
AuthorDate: Fri Feb 1 17:45:19 2019 -0800

IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x

The new commit in Impala-lzo/master breaks the builds of Impala-2.x.
We should depend on a dedicated branch for 2.x.

Change-Id: I67591c7cfc4bede5c096a49beb57da34bb697338
Reviewed-on: http://gerrit.cloudera.org:8080/12339
Tested-by: Impala Public Jenkins 
Reviewed-by: Fredy Wijaya 
---
 bin/bootstrap_system.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bin/bootstrap_system.sh b/bin/bootstrap_system.sh
index bce161b..be79392 100755
--- a/bin/bootstrap_system.sh
+++ b/bin/bootstrap_system.sh
@@ -206,7 +206,7 @@ echo ">>> Checking out Impala-lzo"
 : ${IMPALA_LZO_HOME:="${IMPALA_HOME}/../Impala-lzo"}
 if ! [[ -d "$IMPALA_LZO_HOME" ]]
 then
-  git clone https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME"
+  git clone --branch 2.x https://github.com/cloudera/impala-lzo.git 
"$IMPALA_LZO_HOME"
 fi
 
 echo ">>> Checking out and building hadoop-lzo"

[impala] 03/04: Turn off shell debug tracing for create-load-data.sh

[impala] 01/04: IMPALA-7214: [DOCS] More on decoupling impala and DataNodes

[impala] 04/04: IMPALA-8183: fix test_reportexecstatus_retry flakiness

[impala] 02/04: Add support for compiling using OpenSSL 1.1

[impala] branch master updated: IMPALA-8186: script to configure docker network

[impala] branch master updated: IMPALA-5861: fix RowsRead for zero-slot table scan

[impala] branch master updated: IMPALA-8064: Improve observability of wait times for runtime filters

[impala] branch 2.x updated (2106127 -> 22fb381)

[impala] 03/03: IMPALA-6035: Add query options to limit thread reservation

[impala] 02/03: IMPALA-8155: Modify bootstrap_system.sh to bind Impala-lzo/2.x

10 matches

Site Navigation

Mail list logo

Footer information