[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12987 to look at the new patch set (#4). Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: a new BE test was added; core test with cache enabled. Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json 18 files changed, 1,706 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/4 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12987 to look at the new patch set (#3). Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: a new BE test was added; core test with cache enabled. Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util-test.cc M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json 18 files changed, 1,706 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/3 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@88 PS2, Line 88: consolidatin > consolidating Done http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@89 PS2, Line 89: sparse > sparse Done http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@166 PS2, Line 166: : /// Utility function to verify that all partitions' consumption don't exceed their : /// quotas. Retur > clang-tidy failure: struct instead of class. Done http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@184 PS2, Line 184: > empty Done http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@296 PS2, Line 296: meta_cache_->Erase(key); : return true; : } : } > TODO: Add a metric for this. Done http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@343 PS2, Line 343: > insertion_offset + bytes_written Done http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@117 PS2, Line 117: > flake8: E703 statement ends with a semicolon Done http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@120 PS2, Line 120: > flake8: E703 statement ends with a semicolon Done -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 19 Apr 2019 02:20:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13060 ) Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc@154 PS1, Line 154: DEFINE_string(debug_actions, "", "For testing only. Uses the same format as the debug " : "action query options, but allows for injection of debug actions in code paths where " : "query options are not available."); > I could see the argument either way. I think this is fine as-is. No point in dwelling on it for too much as long as it achieves the purpose. -- To view, visit http://gerrit.cloudera.org:8080/13060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b Gerrit-Change-Number: 13060 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 18 Apr 2019 22:26:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8414: Skip header when parsing /proc/net/dev
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13016 ) Change subject: IMPALA-8414: Skip header when parsing /proc/net/dev .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/13016/1/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/13016/1/be/src/util/system-state-info.cc@174 PS1, Line 174: const StringPiece const StringPiece& -- To view, visit http://gerrit.cloudera.org:8080/13016 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ff931671050d44926d0baa77ec374afed1f8225 Gerrit-Change-Number: 13016 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 17 Apr 2019 23:09:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@343 PS2, Line 343: insertion_offset)); insertion_offset + bytes_written -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 18:35:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@147 PS1, Line 147: if (charge_len > capacity_) return false; > Given that we won't actually insert into the cache until after trying to wr This actually seems to be a problematic behavior. We may temporarily exceed the capacity limit as a result of this. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Apr 2019 05:57:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@296 PS2, Line 296: if (pending_insert_set_.size() >= FLAGS_data_cache_write_concurrency || : pending_insert_set_.find(key.ToString()) != pending_insert_set_.end()) { : return false; : } TODO: Add a metric for this. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 16 Apr 2019 19:47:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 2: TODO: - add a test to rotate the files in data-cache-test.cc - add a test for the changes in filesystem-util.cc -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 16 Apr 2019 19:46:40 + Gerrit-HasComments: No
[Impala-ASF-CR] XXX
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/13026 ) Change subject: XXX .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/13026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ia3ede43146006f7e76707f1b7c8c99aae6671db8 Gerrit-Change-Number: 13026 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12987 to look at the new patch set (#2). Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: a new BE test was added; core test with cache enabled. Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M bin/start-impala-cluster.py M common/thrift/metrics.json 17 files changed, 1,467 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/2 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 1: (71 comments) This new patch changes the configuration string to use a uniform capacity quota for all directories. http://gerrit.cloudera.org:8080/#/c/12987/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12987/1//COMMIT_MSG@24 PS1, Line 24: is a tuple of (file's name, file's modification time, file offset) > So this means that you can get a partial cache hit if the offsets match but Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/exec/hdfs-scan-node-base.cc@948 PS1, Line 948: data_cache_miss_bytes_->Set(reader_context_->data_cache_miss_bytes()); > I think for the above counters we figured that this pattern of copying the Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@52 PS1, Line 52: // The callback is invoked by each thread in the multi-threaded tests below. > callback reads like it's called when something is done, how about ThreadFn? Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@157 PS1, Line 157: // Read the same same range inserted previously and they should still all in the cache. > nit: be Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@196 PS1, Line 196: // Create a buffer way larger than the cache. > Why does it need to be larger? Typo. Fixed. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@201 PS1, Line 201: ASSERT_TRUE(cache.Store("foobar", 12345, 0, large_buffer.get(), cache_size)); > Use variables instead of inline constants? Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@220 PS1, Line 220: const int64_t cache_size = 1 << 22; > I find these easier to read in the form of 4 * 1024 * 1024 Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@248 PS1, Line 248: differen > typo Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@301 PS1, Line 301: ootprint) { > Can you add a comment what this test does? Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@330 PS1, Line 330: int main(int argc, char **argv) { > This is not needed anymore if you add your test to the unified backend test Prefer to keep this test as stand-alone to make development easier. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@18 PS1, Line 18: #ifndef IMPALA_RUNTIME_IO_DATA_CACHE_H > Could use #pragma once instead of include guards Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@53 PS1, Line 53: /// punching holes in the backing file it's stored in. As a backing file reaches a certain > It's actually a TODO item to retire older files and copy what's left in the Added a check for filesystem support for hole punching in the new patch. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@55 PS1, Line 55: /// created instead. Note that due to hole punching, the backing file is actually sparse. > This might be a case where a simple ASCII diagram would illustrate the conc Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@56 PS1, Line 56: /// > It's by design that we don't support overlapping range for the first implem Added some explanation in the header comment. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@63 PS1, Line 63: /// happens inline currently during Store(). > It's worth documenting the PAGE_SIZE behaviour since it implies that there' Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@76 PS1, Line 76: class DataCache{ > style nit: missing space before { Done http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@81 PS1, Line 81: DataCache(const std::string& config) : config_(config) { } > totally doesn't matter here, but general best practice in C++11 is to have Thanks for the reminder. Almost every time, I have to look up in the internet for the advantage of pass-by-value-then-move over pass-by-reference. It's subtle but it makes sense. We should probably start sticking to this idiom more widely in Impala code base. May be a clang-tidy rule will help ?! Also totally irrelevant but that's an area where I find passing by pointer in C is sometimes easier to use or understand than C++.
[Impala-ASF-CR] IMPALA-8138: Reintroduce rpc debugging options
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12297 ) Change subject: IMPALA-8138: Reintroduce rpc debugging options .. Patch Set 1: (6 comments) Thanks for working on this patch. My general comment is to consider moving away from the existing "fake" fault injection framework in Thrift and use debug actions to simulate scenarios in which we may actually fail to exercise the entire RPC stack better. http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h File be/src/rpc/impala-control-service-proxy.h: http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h@44 PS5, Line 44: As mentioned elsewhere, this kind of artificial fault injection doesn't seem to be too useful. http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h File be/src/rpc/impala-control-service-proxy.h: http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h@43 PS3, Line 43: Not needed. Same below. http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h File be/src/rpc/rpc-mgr.inline.h: http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h@65 PS5, Line 65: : : : : : : : : : : : : It appears that this send vs recv debug actions were carried over from Thrift implementation. Retrospectively, the "fault injection" we did with Thrift was quite hacky (I am the culprit here) and it stemmed from the total lack of fault injection testing back then for exercising the error paths in Thrift RPC. As part of the KRPC development, we invested in proper fault injection testing by truly pausing the Impala and artificially creates various failure scenario. This allows a more extensive exercise across the entire RPC stack instead of just exercising the RPC handlers at the client and server sides. With the fault injection framework, it seems to be not too useful to continue with this path of artificial fault injection via debug action we used to do with Thrift. Instead, we may want to rethink the fault injection testing with KRPC. In particular, it may exercise the code better by doing some of the followings: - use debug actions to inject random delays in the RPC handlers. This is particularly useful for RPCs with timeout - use debug actions to randomly reject some of the incoming RPCs in ImpalaServicePool - use debug actions to respond with error status in the RPC handlers. The errors will be specific to each RPC handler (e.g. deserialization error of Thrift profiles, deserialization errors of RowBatch) - debug action to force some incoming RPCs to use deferred queue in KrpcDataStreamRecvr - (experimental / dangerous) "randomly" corrupt the incoming RPC payloads in ImpalaServicePool. - inject delay in RPC callback in the client side to simulate an overloaded client The above are some examples I can think of right now. For other failures, we may need to rely on the fault injection framework: - use iptables to drop all incoming packets to the RPC port from a particular host. This simulates a host which was powered off or network partitions - Restart remote Impalad will trigger the behavior of broken connections (by sending a RST packet) - Send SIGSTOP to remote Impalad (which we already do in the fault injection framework) to simulate non-responsive Impalad - other ideas... In general, my suggestion is to use debug actions to simulate failure which can actually happen instead of using this artificial fault injection which seems a bit meaningless at this point. http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h@314 PS1, Line 314: proxy_ > I assume that if I change the class name back to something ending in "Proxy Yup. http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@171 PS5, Line 171: if (qs.get() == nullptr) { : Status status(ErrorMsg(TErrorCode::INTERNAL_ERROR, Should this be converted to debug action ? http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@186 PS5, Line 186: : Should this be converted to debug action ? -- To view, visit http://gerrit.cloudera.org:8080/12297 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941 Gerrit-Change-Number: 12297 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 ) Change subject: IMPALA-8341: Data cache for remote reads .. Patch Set 1: (3 comments) > Another option would be to do a global hashtable and have the > entries directly identify their partitions (in fact I think they > already do via the CacheFile pointer, right?). Eviction still needs > to be per-partition based on that partition's capacity, but > insertion can try to more smartly allocate across drives. > Yes, I had a similar question when thinking over the patch over the weekend. I tried the approach of weighted sampling + a global LRU cache but stopped short of coming up with a way to maintain the quota per partition with a global LRU list so I abandoned it and kind of punted on this problem for now. May be it's okay to do LRU per partition. Should have documented my thought there in the code. I think the _IO_ apportioned relative to capacity may be fine as that's a reasonable expectation if a user specifies more capacity on a slower device. One thing to consider may be to use multiple backing files for larger partition to avoid per-file lock problem. > Last thought: since there are some tricky design decisions above, > maybe we can simplify this and just support a single partition for > now, and come back to supporting multi-partition? It may be reasonable to punt on it for now. Let me see how much work it is to switch over to the global hash-table + per-partition LRU approach. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@53 PS1, Line 53: /// punching holes in the backing file it's stored in. As a backing file reaches a certain > This does mean that the number of backing files could grown unbounded right It's actually a TODO item to retire older files and copy what's left in them to the current file. I didn't get around to doing that in this first version. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@56 PS1, Line 56: /// > We should mention in the comment how lookup works, in particular what happe It's by design that we don't support overlapping range for the first implementation. It seems to fare pretty well with the TPC-DS workload and parquet file format we are using. One of the TODO in the future is to measure the overlapping case and consider handling them if they are actually common. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@79 PS1, Line 79: // Unlink the file so the disk space is recycled once the process exits. > I'm not sure how I feel about this - it could cause a lot of confusion havi Thanks for the feedback. I also considered the approach of cleaning up the directory on startup but this seems to make certain assumption about whether the directory specified by the user for caching is exclusively used by the Impalad process. While this is not a common scenario right now, I can see there may be configuration in the future where one may run multiple Impala containers on the same host and these containers may belong to the same cluster. So, wiping out a particular directory on restart isn't exactly safe. Another alternative would be to include the hostname / IP address + port number as a unique name for the file so we will automatically truncate the file on restart. However, there is no guarantee that an Impalad will restart with the same IP address / hostname after a crash, esp. in a public cloud setting. That said, one can also argue that this is a configuration issue and those Impala containers shouldn't be sharing directories etc. The feedback so far suggests that the "hidden" disk usage seems even more undesirable so may be wiping out the directory at startup is not as bad and we can just forbid sharing of caching directories by multiple Impalads on the same host. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 11 Apr 2019 17:13:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 9: (11 comments) http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h File be/src/rpc/rpc-mgr-test.h: http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@134 PS9, Line 134: get_current_queue_size Coding style suggests this should be "GetCurrentQueueSize()". With that said, it may not be needed after all if you take the suggsetion in rpc-mgr-test.cc. http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@295 PS9, Line 295: kudu::Status::IOError( : Substitute("ping failing, number of calls=$0.", number_of_calls_)); Instead of IOError, can we simulate the server busy error here ? Please see comments in rpc-mgr-test.cc for reason. http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@269 PS9, Line 269: until typo http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@285 PS9, Line 285: // Find the counter which tracks the number of times the service queue is too full. : const string& overflow_count = Substitute( : ImpalaServicePool::RPC_QUEUE_OVERFLOW_METRIC_KEY, ping_service->service_name()); : IntCounter* overflow_metric = : ExecEnv::GetInstance()->rpc_metrics()->FindMetricForTesting( : overflow_count); : ASSERT_TRUE(overflow_metric != nullptr); : // There has been no activity yet so this should be 0. : EXPECT_EQ(overflow_metric->GetValue(), 0L); : : // Make a proxy for Ping rpc calls. : unique_ptr proxy; : ASSERT_OK(ping_service->GetProxy(krpc_address_, FLAGS_hostname, )); : TQueryCtx query_ctx; : : // Do an async call that will run in the service's single thread. : RpcController async_controller; : CountingBarrier async1_done(1); : PingRequestPB request1; : PingResponsePB response1; : ResponseCallback async1_callback = [&]() { async1_done.Notify(); }; : proxy->PingAsync(request1, , _controller, async1_callback); : : // wait for the async call to be running. : bool timed_out = false; : call1_started.Wait(10 * MILLIS_PER_SEC, _out); : ASSERT_FALSE(timed_out); : ASSERT_EQ(0, get_current_queue_size(rpc_mgr_)); : // Now the async call is running in the service. : : // Do a second async call, this will set in the queue, which will be full. : RpcController async_controller2; : CountingBarrier async2_done(1); : // The second call only has a short sleep. : PingRequestPB request2; : PingResponsePB response2; : // Set the deadline to something reasonable otherwise the pings from DoRpcWithRetry : // will replace this async call in the queue. : MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(10); : async_controller2.set_deadline(deadline); : ResponseCallback async2_callback = [&]() { async2_done.Notify(); }; : proxy->PingAsync(request2, , _controller2, async2_callback); : : // Wait for the second async call to get queued. : int num_sleeps = 20; : while (get_current_queue_size(rpc_mgr_) != 1) { : SleepForMs(300); : if (--num_sleeps < 0) { : FAIL() << "Waited too long for async message to be queued"; : } : } : // Now the second async call is queued, and the queue is full. : : // Assert that there have been no overflows yet. : EXPECT_EQ(overflow_metric->GetValue(), 0L); : : // Call DoRpcWithRetry with retry on our busy service. : // This won't succeed, but we can observe that retries occurred. : const int retries = 3; : PingRequestPB request3; : PingResponsePB response3; : impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, ::Ping, : request3, , query_ctx, "ping failed", retries, 100 * MILLIS_PER_SEC); : EXPECT_ERROR(rpc_status, TErrorCode::GENERAL); : EXPECT_STR_CONTAINS(rpc_status.msg().msg(), : "dropped due to backpressure. The service queue contains 1 items out
[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 ) Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb .. Patch Set 2: Code-Review+1 (4 comments) I'll leave it to Lars or other reviewers to do the +2. http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@9 PS2, Line 9: and nit: an ? http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@11 PS2, Line 11: cascade : of blocked threads Makes me wonder if there are actually other similar problematic sequence in the HdfsScanNode code ?! http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h@117 PS2, Line 117: /// together, this lock must be taken first. Probably worth documenting briefly why this is a "timed_mutex" instead of a plain "mutex". http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc@292 PS2, Line 292: The lock can be held for significant periods of time if the scan : // node is being cancelled. Would it be clearer to say another thread could be holding this lock for an extended period of time. When I first read it, I kind of interpret it as this lock is held by _this_ function for a long time ? Also, may also help to call out that this function may be called under some other locks so that's why the remedies are being done here to break out of the loop earlier if the scan node is being cancelled. -- To view, visit http://gerrit.cloudera.org:8080/12968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130 Gerrit-Change-Number: 12968 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 10 Apr 2019 22:25:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7981: Add host disk statistics to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12822 ) Change subject: IMPALA-7981: Add host disk statistics to profile .. Patch Set 6: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@57 PS6, Line 57: num_values nit: expected_num_values http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@110 PS6, Line 110: : nit: unnecessary blank lines. http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@129 PS6, Line 129: stat_string Does it make sense to DCHECK(StringPiece(stat_string).starts_with('cpu ')); ? -- To view, visit http://gerrit.cloudera.org:8080/12822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57 Gerrit-Change-Number: 12822 Gerrit-PatchSet: 6 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Wed, 10 Apr 2019 19:10:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12987 Change subject: IMPALA-8341: Data cache for remote reads .. IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of : pairs, separated by comma. An example configuration string: --data_cache_config=/data/0:150GB,/data/1:150GB Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). Testing done: a new BE test was added; core test with cache enabled. Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/io/CMakeLists.txt A be/src/runtime/io/data-cache-test.cc A be/src/runtime/io/data-cache.cc A be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-file-reader.h M be/src/runtime/io/request-context.h M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json 14 files changed, 1,115 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/1 -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8395: Parse older formats of /proc/net/dev correctly
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12954 ) Change subject: IMPALA-8395: Parse older formats of /proc/net/dev correctly .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic804955d8e4269e787037a6dc68bef2d70382426 Gerrit-Change-Number: 12954 Gerrit-PatchSet: 4 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 09 Apr 2019 02:05:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7981: Add host disk statistics to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12822 ) Change subject: IMPALA-7981: Add host disk statistics to profile .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/12822/4/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12822/4/be/src/util/system-state-info.h@255 PS4, Line 255: for (int i = 0; i < num_values; ++i) { : int64_t v = -1; : (*ss) >> v; Do you plan to choose https://gerrit.cloudera.org/#/c/12954/2/be/src/util/system-state-info.cc@169 over the current implementation ? http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@183 PS2, Line 183: /// The enum names correspond to the fields of /proc/diskstats : /// https://www.kernel.org/doc/Documentation/iostats.txt > These are the fields present in 2.6, while 4.8 added more fields that we're Yes, I think the link to the doc should be sufficient. -- To view, visit http://gerrit.cloudera.org:8080/12822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57 Gerrit-Change-Number: 12822 Gerrit-PatchSet: 4 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 08 Apr 2019 20:56:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8395: Parse older formats of /proc/net/dev correctly
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12954 ) Change subject: IMPALA-8395: Parse older formats of /proc/net/dev correctly .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc@168 PS2, Line 168: colon_idx = sp.find_first_of(':'); Should this also check for the case of colon_idx == npos and skip ? http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc@170 PS2, Line 170: I wonder if we should just skip if counters.values() < NUM_NET_VALUES ? Will doing so cause us to always skip for certain versions of old kernels ? -- To view, visit http://gerrit.cloudera.org:8080/12954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic804955d8e4269e787037a6dc68bef2d70382426 Gerrit-Change-Number: 12954 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 08 Apr 2019 20:38:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7981: Add host disk usage to profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12822 ) Change subject: IMPALA-7981: Add host disk usage to profile .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc File be/src/util/system-state-info-test.cc: http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc@105 PS2, Line 105: R"( 8 0 sda 17124835 222797 716029974 8414920 43758807 38432095 7867287712 630179264 0 32547524 638999340 > line too long (117 > 90) Do we also care about testing for random devices which show up in /proc/diskstats such as ram0 ? My /proc/diskstats has the followings: 1 0 ram0 0 0 0 0 0 0 0 0 0 0 0 1 1 ram1 0 0 0 0 0 0 0 0 0 0 0 1 2 ram2 0 0 0 0 0 0 0 0 0 0 0 1 3 ram3 0 0 0 0 0 0 0 0 0 0 0 1 4 ram4 0 0 0 0 0 0 0 0 0 0 0 1 5 ram5 0 0 0 0 0 0 0 0 0 0 0 1 6 ram6 0 0 0 0 0 0 0 0 0 0 0 1 7 ram7 0 0 0 0 0 0 0 0 0 0 0 1 8 ram8 0 0 0 0 0 0 0 0 0 0 0 1 9 ram9 0 0 0 0 0 0 0 0 0 0 0 1 10 ram10 0 0 0 0 0 0 0 0 0 0 0 1 11 ram11 0 0 0 0 0 0 0 0 0 0 0 1 12 ram12 0 0 0 0 0 0 0 0 0 0 0 1 13 ram13 0 0 0 0 0 0 0 0 0 0 0 1 14 ram14 0 0 0 0 0 0 0 0 0 0 0 1 15 ram15 0 0 0 0 0 0 0 0 0 0 0 7 0 loop0 0 0 0 0 0 0 0 0 0 0 0 7 1 loop1 0 0 0 0 0 0 0 0 0 0 0 7 2 loop2 0 0 0 0 0 0 0 0 0 0 0 7 3 loop3 0 0 0 0 0 0 0 0 0 0 0 7 4 loop4 0 0 0 0 0 0 0 0 0 0 0 7 5 loop5 0 0 0 0 0 0 0 0 0 0 0 7 6 loop6 0 0 0 0 0 0 0 0 0 0 0 7 7 loop7 0 0 0 0 0 0 0 0 0 0 0 http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc@177 PS2, Line 177: // Tests the computation logic for disk usage. This test also makes sure that values of a : // partition are not accounted towards the disk it is on. Is this comment meant for the earlier test at line 103 ? http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h File be/src/util/system-state-info.h: http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@183 PS2, Line 183: /// The enum names correspond to the fields of /proc/diskstats : /// https://www.kernel.org/doc/Documentation/iostats.txt The followings are the metrics common to both 2.6 and 4.8+, right ? http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@221 PS2, Line 221: Entries are : /// set to 0 for invalid entries. Invalid entries are set to 0. http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@228 PS2, Line 228: /// Compute the disk usage. Computes the read and write rate for the most recent epoch. http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@233 PS2, Line 233: DiskUsage nit: I kind of interpret it as disk space used on first read. DiskStats may be a slightly better name but then it may not be consistent with other metrics we are measuring. http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc File be/src/util/system-state-info.cc: http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@45 PS2, Line 45: memset(_usage_, 0, sizeof(network_usage_)); memset(_usage_, 0, sizeof(disk_usage_)); http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@81 PS2, Line 81: LOG(WARNING) << "Could not open " << path << ": " << GetStrErrMsg() << endl; Will this lead to spam in the log if we periodically try reading the same path and fail or is it the responsibility of the caller to throttle calling this function ? http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@199 PS2, Line 199: DCHECK(disk_val_idx_ == 0 || disk_val_idx_ == 1); http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@232 PS2, Line 232: == <= 0 -- To view, visit http://gerrit.cloudera.org:8080/12822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57 Gerrit-Change-Number: 12822 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Sat, 06 Apr 2019 01:37:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333 PS5, Line 333: // Call DoRpcWithRetry with no retry on our busy service. : const int retries = 3; : PingRequestPB request3; : PingResponsePB response3; : impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, ::Ping, : request3, , query_ctx, "ping failed", retries, 100 * MILLIS_PER_SEC); : EXPECT_ERROR(rpc_status, TErrorCode::GENERAL); : EXPECT_STR_CONTAINS(rpc_status.msg().msg(), : "dropped due to backpressure. The service queue contains 1 items out of a maximum " : "of 1; memory consumption is "); : ASSERT_EQ(overflow_metric->GetValue(), retries); : : // Call DoRpcWithRetry, but this time we do retey on a busy service, and this waiting : // allows the outstanding async calls to complete, so that then this call will succeed. : PingRequestPB request4; : PingResponsePB response4; : impala::Status rpc_status_backoff = : RpcMgr::DoRpcWithRetry(proxy, ::Ping, request4, , : query_ctx, "ping failed", 10, 100 * MILLIS_PER_SEC, 2 * MILLIS_PER_SEC); : ASSERT_OK(rpc_status_backoff); : ASSERT_GT(overflow_metric->GetValue( > I reviewed the test and tried to make to more bulletproof. I think any test How about the following idea: - the RPC handler can share some sort of condition variable or barrier with the main test. The test can then indirectly control when the RPC handler will complete by signaling those condition variables. In this way, as long as the main thread doesn't signal the CV, the first RPC will never finish and the second RPC will also block forever. This makes the test not timing dependent and less prone to failure due to random timing issue. What do you think ? http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@178 PS7, Line 178: // A no-op method that is used as part of overloading DoRpcWithRetry(). : static void logging_noop() {} : : // The type of the log method passed to DoRpcWithRetry(). : typedef boost::function RpcLogFn; Not needed. http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@186 PS7, Line 186: timeout timeout_ms http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178 PS5, Line 178: // A no-op method that is used as part of overloading DoRpcWithRetry(). : static void logging_noop() {} : : // The type of the log method passed to DoRpcWithRetry(). : typedef boost::function RpcLogFn; > Removed They still seem to be in PS7 http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201 PS5, Line 201: : private: : /// One pool per registered service. scoped_refptr<> is dictated by the Kudu interface. : std::vector> service_pools_; : > OK I'll leave this alone and Thomas will clean up the mess? :-) Sounds good to me. -- To view, visit http://gerrit.cloudera.org:8080/12672 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390 Gerrit-Change-Number: 12672 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 02 Apr 2019 02:08:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc@174 PS7, Line 174: const auto trigger = MakeScopeExitTrigger( : [this]() { last_report_time_ms_ = GenerateReportTimestamp(); }); Does it implicitly rely on the LIFO order for local variables' destruction in C++ so "notifier" won't be destroyed before "trigger" ? Otherwise, we could have notified 'exec_complete_barrier' without setting 'last_report_time_ms'. I may be paranoid but can you please add a comment about this so future changes won't accidentally swap their orders ? I suppose we can skip the use of NotifyBarrierOnExit here and just notify 'exec_complete_barrier' in 'trigger' after setting 'last_report_tims_ms_'. http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@549 PS7, Line 549: DCHECK( DCHECK_LE http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@664 PS7, Line 664: DCHECK( DCHECK_LE http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398 PS4, Line 398: > k. We could deprecate it in a separate patch I suppose. Fine with this as i As Thomas said, this config option was useful when users faced instability with large Impala cluster due to IMPALA-2990 (which this patch is fixing). We can consider deprecating this flag once we are convinced that this patch makes enough improvement that the instability with large cluster is mostly mitigated. The periodic reporting matters for long running queries. The default reporting interval of 5 seconds could be longer than most of the short running query's duration. For long running queries, it probably doesn't really matter to get a report every 5 seconds vs say 30s or 60s. If we ever deprecate this flag, please consider bumping the the default reporting interval to something larger. http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288 PS4, Line 2288: return Status::OK(); : } : : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const Status& status) { : DCHECK(!status.ok()); : cancellation_thread_pool_->Offer( > k, would be interested what Michael thinks about that idea The executor's reporting path currently can tolerate up to FLAGS_status_report_max_retries number of failures before giving up and declaring the coordinator as inaccessible. It's intentionally not a time-based approach and instead relies on the periodic reporting to handle the retry. Stepping back a bit, my understanding for why we want to approximate the maximum retry time is that the coordinator may not want to prematurely cancel a query due to a missing report from an executor if it knows that the executor *may* still be retrying to send a report. However, as pointed out elsewhere by Todd, if an Impala can experience random pauses or random scheduling delay for the reporting thread, no amount of wait time will guarantee that an executor is not in the middle of retrying to send a report when the query is cancelled. I suppose that's not an end-goal we try to achieve. This prompts me to think whether it's any different if we just do the following: - expose a coordinator's maximum tolerable lag time as a knob set to arbitrary number - expose an executor's maximum tolerable retry time as a knob set to some arbitrary number - deprecate FLAGS_status_report_max_retries An alternative would be to: - expose an executor's maximum tolerable retry time as a knob set to some arbitrary number - deprecate FLAGS_status_report_max_retries - set coordinator's maximum tolerable lag time as (100 + x)% of the maximum tolerable retry time where x *may be* tunable in case the default value causes any trouble. Todd / Thomas, what do you think ? http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@218 PS7, Line 218: DEFINE_int32(status_report_max_retries, 3, > Is our timeout too aggressive here? If one of the backends goes into some k I agree that the coordinator should be more lenient in handling lost reports to avoid false positive. On the other hand, no amount of timeout can save us from random pauses in Impalad so we should work on
[Impala-ASF-CR] IMPALA-7800: Time out new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Time out new connections after --fe_service_threads .. Patch Set 13: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12579/13/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/13/be/src/rpc/TAcceptQueueServer.cpp@192 PS13, Line 192: int wait_result = 0; nit: This can be in the while loop. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 13 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 28 Mar 2019 23:23:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12672 ) Change subject: IMPALA-8143: Enhance DoRpcWithRetry(). .. Patch Set 2: (19 comments) http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h File be/src/rpc/rpc-mgr-test.h: http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h@296 PS5, Line 296: ERR typo http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc File be/src/rpc/rpc-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@214 PS5, Line 214: ASSERT_OK(rpc_mgr_.RegisterService(10, 10, scan_mem_impl, : static_cast< Not sure if it's output of clang-tidy or something but I find it a bit hard to ready to split it like this. Can't we keep these 2 lines unchanged ? http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@263 PS5, Line 263: ueue f typo http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@291 PS5, Line 291: nullptr; Also do you want to assert that it's zero ? http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@298 PS5, Line 298: // Configure the service not used ? http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@316 PS5, Line 316: ASSERT_OK(ping_ Instead of sharing "sleep_ms" between RPCs, it may be cleaner to pass the sleep time as RPC argument instead. http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@322 PS5, Line 322: // A lambda t proxy-> http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333 PS5, Line 333: : // wait for async call to be running : sleep_started.Wait(); : ASSERT_EQ(0, get_current_queue_size(rpc_mgr_)); : : // Do a second async call that will fill up the queue : RpcController async_controller2; : CountingBarrier async2_done(1); : // Reset the sleep time : sleep_ms = 100; : // Set the deadline to something reasonable otherwise the pings from DoRpcWithRetry : // will replace this async call in the queue. : MonoTime deadline = MonoTime::Now() + MonoDelta::FromSeconds(10); : async_controller2.set_deadline(deadline); : ResponseCallback async2_callback = [&]() { async2_done.Notify(); }; : proxy.get()->PingAsync(request, , _controller2, async2_callback); : : // Wait for second async to get queued : SleepForMs(300); : // Check the queue size : ASSERT_EQ(1, get_current_queue_size( These set of tests seem rather timing dependent. For instance, it may occur that the first time DoRpcWithRetry() is called will succeed because the async call somehow finishes by then due to some weird scheduling order. May be better to rethink a better way to test it. http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@378 PS5, Line 378: ASSERT_FALSE(async2_done.pending()); : : // Test injection of failures with DebugAction. : query_ctx.client_request.query_options.__set_debug_action("DoRpcWithRetry:FAIL"); : Status inject_status = RpcMgr::DoRpcWithRetry(ping_rpc, query_ctx, "ping failed", 1, : MonoDelta::FromSeconds(100), 0, "DoRpcWithRetry"); : ASSERT_FALSE(inject_status.ok()); : EXPECT_ERROR(inject_status, TErrorCo Please see comments in rpc-mgr.inline.h that we probably should only retry if the remote server is busy. For all other errors, we probably should report it right away to callers. We can consider transparently handling other types of errors in the future. http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178 PS5, Line 178: void ToJson(rapidjson::Document* document); : : /// Retry the Rpc 'rpc_call' up to 'times_to_try' times. : /// Each Rpc has a timeout of 'timeout'. : /// If the service is busy then sleep 'se Not needed as discussed in person. http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@189 PS5, Line 189: the R typename http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@191 PS5, Line 191: n the 'l It's important to highlight that this only works iff rpc_call is idempotent in the comments above. Any non-idempotent rpc_call using this interface may have undesired consequence. http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@193 PS5, Line 193: ebugAction() to potentially inject errors. : template It seems inconsistent to pass the timeout in as MonoDelta while passing the backoff time
[Impala-ASF-CR] IMPALA-7800: Time out new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Time out new connections after --fe_service_threads .. Patch Set 11: (2 comments) Looking good. Please address the last two items and I can +2 it. http://gerrit.cloudera.org:8080/#/c/12579/11/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/11/be/src/rpc/TAcceptQueueServer.cpp@130 PS11, Line 130: timeout nit: timeout_ms http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90 PS10, Line 90: except Exception as e: > The test will be marked as failed in that case (XFAIL). Added an else block My understanding is that xfail will just mark it as "unexpected pass" without flagging any other problem if the test happens to pass. Would it be more robust to set a flag in the exception handling block and assert that the flag is set at the end of the test ? In this way, there is no need to mark the test as xfail. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 11 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 27 Mar 2019 22:35:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 10: (9 comments) LGTM. Mostly nits. http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG@7 PS10, Line 7: Reject new connections after --fe_service_threads Does this need to be updated too ? http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@65 PS10, Line 65: timeout nit: timeout_ms http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@116 PS10, Line 116: /// Amount of time after which a connection request will be timed out. in milliseconds http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp@263 PS10, Line 263: if (queue_timeout_ms_ != 0) : entry->expiration_time_ = MonotonicMillis() + queue_timeout_ms_; May need to guard against negative value although it's user's fault in that case. if (queue_timeout_ms > 0) { } http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h File be/src/rpc/thrift-server.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@149 PS10, Line 149: int64_t queue_timeout_ms = 0); May make sense to document the meaning of this argument too. nit: this line seems to fit into the previous line too. http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@197 PS10, Line 197: /// Amount of time an accepted client connection will be kept in the accept : /// queue before it is timed out. Used in TAcceptQueueServer. May want to document that if it's 0, it means there is no timeout (similar to max_concurrent_connections_ above). http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@277 PS10, Line 277: } nit: missing blank line after this function. http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc@239 PS10, Line 239: accepted_cnxn_timeout Should this have a more specific name such as "accepted_client_cnxn_timeout" as we use Thrift server for various internal services too ? http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90 PS10, Line 90: except Exception as e: How do we catch cases in which the test passes without raising the exception ? -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Mar 2019 22:05:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add extra TRACE calls for cancellation code path
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12832 ) Change subject: Add extra TRACE calls for cancellation code path .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@215 PS1, Line 215: TRACE_COUNTER_SCOPE_LATENCY_US("find_receiver_us"); Doing it here instead of FindRecvr() means you can also measure the time spent waiting for the lock. So, does it make sense to also sample similar code path in CloseSender() ? http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@378 PS1, Line 378: { Do we care about sampling this scope too ? http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@406 PS1, Line 406: { Same question here. http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/service/control-service.cc@24 PS1, Line 24: #include "kudu/util/trace.h" nit: is this new #include needed ? -- To view, visit http://gerrit.cloudera.org:8080/12832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifedd95098a011d7a3bb3ebb54bfb3cbb7e94fab4 Gerrit-Change-Number: 12832 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 22 Mar 2019 01:47:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8277: support ranges in CPU lists
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12755 ) Change subject: IMPALA-8277: support ranges in CPU lists .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97311bfbcf70bea069e941b6e7f4f015fb781b3f Gerrit-Change-Number: 12755 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 15 Mar 2019 00:06:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8299: Fix crash in GroupingAggregator on uninited hash table
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12746 ) Change subject: IMPALA-8299: Fix crash in GroupingAggregator on uninited hash table .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I381657c43f93eb4871b93d54532cb886198fd0b2 Gerrit-Change-Number: 12746 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Mar 2019 17:47:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (6 comments) Looking good. Please see some more comments. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h@88 PS6, Line 88: void CleanupAndClose(const std::string& error, Please add a brief comment about what this function does and what those parameters are. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@29 PS6, Line 29: DEFINE_int64(accepted_cnxn_timeout, 300, : "(Advanced) The amount of time in seconds an accepted connection will wait in the " : "queue before we time it out and reject the connection request. A value of 0 " : "means there is no timeout."); Sorry but now that I realize this may affect all existing Thrift based services, I would think it's safe to only enable this for external client facing Thrift services (e.g. HS2, Beeswax). For Catalog and statestore and Impala internal connections, it may be better to leave it as no timeout. In other words, it may make sense to have this timeout as part of the constructor of TAcceptQueueServer and create flags only for HS2 and Beeswax related Thrift services. Sorry for not noticing this earlier. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@195 PS6, Line 195: wait_time nit: coding convention in Impala usually keeps the unit as the suffix of the variable name so this can be wait_time_ms; http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@205 PS6, Line 205: msecs nit: we either use ms or milliseconds. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@207 PS6, Line 207: (wait_result == THRIFT_ETIMEDOUT) Do we care about any potential non-zero result value ? Should we bail out in those cases too ? http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@265 PS6, Line 265: if (FLAGS_accepted_cnxn_timeout != 0) : entry->expiration_time_ = : MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; nit styling: if (FLAGS_... != 0) { } -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 12 Mar 2019 01:50:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 8: Code-Review+1 (2 comments) Sorry again for the delay. LGTM. I added Bharath and Paul to the reviewer list so they can double check the logic in FE. http://gerrit.cloudera.org:8080/#/c/12257/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12257/8//COMMIT_MSG@14 PS8, Line 14: until IMPALA-4224 is implemented. Inserts work without Also, as the way it's implemented now, the memory consumption for broadcast join will scale with MT_DOP, right ? http://gerrit.cloudera.org:8080/#/c/12257/8/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/12257/8/be/src/exec/partitioned-hash-join-builder.cc@108 PS8, Line 108: filter_ctxs_.back().filter = state->filter_bank()->RegisterFilter(filter_desc, true); May want to add a reference to IMPALA-4400 which is abput aggregating the filter locally. -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Mar 2019 22:44:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12631 ) Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5 Gerrit-Change-Number: 12631 Gerrit-PatchSet: 5 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Wed, 06 Mar 2019 00:11:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3816: (prep) Move TupleSorter to sorter-ir.cc
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10679 ) Change subject: IMPALA-3816: (prep) Move TupleSorter to sorter-ir.cc .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaaf2b75c2f789002c42939865c018f728d29a113 Gerrit-Change-Number: 10679 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Mar 2019 22:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 8: Sorry, will get back to it this week. -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 04 Mar 2019 22:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12624 ) Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. Patch Set 5: Code-Review+2 Rebase. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 04 Mar 2019 22:42:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12651 ) Change subject: IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() .. Patch Set 2: Waiting for https://gerrit.cloudera.org/#/c/12649/ to merge. -- To view, visit http://gerrit.cloudera.org:8080/12651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 Gerrit-Change-Number: 12651 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 04 Mar 2019 19:35:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12651 ) Change subject: IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() .. Patch Set 2: Code-Review+2 Added a line of comment. Carry Thomas' +2. -- To view, visit http://gerrit.cloudera.org:8080/12651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 Gerrit-Change-Number: 12651 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 04 Mar 2019 19:34:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()
Hello Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12651 to look at the new patch set (#2). Change subject: IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() .. IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() The coordinator skips over any stale or duplicated status reports of fragment instances. In the previous implementation, the index pointing into the vector of Thrift profiles wasn't updated when skipping over a status report. This breaks the assumption that the status reports and thrift profiles vectors have one-to-one correspondence. Consequently, we may end up passing the wrong profile to InstanceStats::Update(), leading to random crashes. This change fixes the problem above by using iterators to iterate through the status reports and thrift profiles vectors and ensures that both iterators are updated on every iteration of the loop. Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 --- M be/src/runtime/coordinator-backend-state.cc M tests/custom_cluster/test_rpc_timeout.py 2 files changed, 33 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/12651/2 -- To view, visit http://gerrit.cloudera.org:8080/12651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 Gerrit-Change-Number: 12651 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12651 Change subject: IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() .. IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport() The coordinator skips over any stale or duplicated status reports of fragment instances. In the previous implementation, the index pointing into the vector of Thrift profiles wasn't updated when skipping over a status report. This breaks the assumption that the status reports and thrift profiles vectors have one-to-one correspondence. Consequently, we may end up passing the wrong profile to InstanceStats::Update(), leading to random crashes. This change fixes the problem above by using iterators to iterate through the status reports and thrift profiles vectors and ensures that both iterators are updated on every iteration of the loop. Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 --- M be/src/runtime/coordinator-backend-state.cc M tests/custom_cluster/test_rpc_timeout.py 2 files changed, 32 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/12651/1 -- To view, visit http://gerrit.cloudera.org:8080/12651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752 Gerrit-Change-Number: 12651 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12631 ) Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues .. Patch Set 4: (2 comments) LGTM. Please fix the other two usages of "host" http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml@555 PS4, Line 555: host Impalad http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml@555 PS4, Line 555: host Impalad -- To view, visit http://gerrit.cloudera.org:8080/12631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5 Gerrit-Change-Number: 12631 Gerrit-PatchSet: 4 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Fri, 01 Mar 2019 23:32:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@198 PS5, Line 198: if (entry->expiration_time_ != 0) : wait_time = entry->expiration_time_ - MonotonicMillis(); styling nit: if (entry->expiration_time_ != 0) { wait_time = entry->expiration_time_ - MonotonicMillis(); } http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@200 PS5, Line 200: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; Does it make sense to log only if we are timing out a request below ? http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@263 PS5, Line 263: MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; nit: indent 4 http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@267 PS5, Line 267: connection_setup_pool.Offer( In theory, this could block too but it's still counted towards the expiration_time_. So, is it possible for the code to reach line 199 above and compute a negative wait_time value ? http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py@1 PS5, Line 1: nit: blank line -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 5 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 01 Mar 2019 23:30:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12631 ) Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12631/3/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/12631/3/docs/topics/impala_known_issues.xml@565 PS3, Line 565: host Impalad -- To view, visit http://gerrit.cloudera.org:8080/12631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5 Gerrit-Change-Number: 12631 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Fri, 01 Mar 2019 18:54:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12624 ) Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12624/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12624/3//COMMIT_MSG@18 PS3, Line 18: test_rpc_timeout.py > Is that file missing from the change then? There is no change in that test file. A pre-existing test already exercises that path. -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 01 Mar 2019 18:59:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12624 ) Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@119 PS1, Line 119: "The service queue is full; it has $3 items; memory " > If we want to keep the single message then maybe a message like Done -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 01 Mar 2019 00:41:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Hello Andrew Sherman, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12624 to look at the new patch set (#3). Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() An incoming request to a RPC service can be rejected due to either exceeding the memory limit or maximum allowed queue length. It's unclear from the current error message which of those factors contributes to the failure as neither the actual queue length nor the memory consumption is printed. This patch fixes the problem by printing the estimated queue length and memory consumption when a RPC request is dropped. Testing done: verified the new error message with test_rpc_timeout.py Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h 2 files changed, 8 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/3 -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8264: Fix overflow in SystemStateInfo::ComputeCpuRatios
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12638 ) Change subject: IMPALA-8264: Fix overflow in SystemStateInfo::ComputeCpuRatios .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12638/1/be/src/util/system-state-info-test.cc File be/src/util/system-state-info-test.cc: http://gerrit.cloudera.org:8080/#/c/12638/1/be/src/util/system-state-info-test.cc@63 PS1, Line 63: CPUY typo ? -- To view, visit http://gerrit.cloudera.org:8080/12638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46dfb7264e85a3da492e383caaa6853d8ef95f35 Gerrit-Change-Number: 12638 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 28 Feb 2019 23:40:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@28 PS3, Line 28: 120, Will it be safer to have a higher default timeout (e.g. 300 seconds) to avoid users' complaint given the current behavior is that we don't have a timeout ? Please also add a remark that setting it to 0 means there is no timeout. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@151 PS3, Line 151: const string& error) nit: we tend to put the constant parameters followed by non-constant parameters. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@193 PS3, Line 193: 0LL nit: 'LL' not needed. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@196 PS3, Line 196: if (entry->expiration_time_) nit: if (entry->expiration_time_ != 0) { http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@197 PS3, Line 197: wait_time = entry->expiration_time_ - MonotonicMillis(); Shouldn't this be inside the while loop below ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@199 PS3, Line 199: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; Will this lead to log spam ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@204 PS3, Line 204: Timing out connection request."; Any chance we can print some details (e.g. IP address, port) here ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@239 PS3, Line 239: shared_ptr This can be "unique_ptr" now that TTransport lives inside TAcceptQueueEntry. This makes ownership easier to understand. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@260 PS3, Line 260: if (FLAGS_accepted_cnxn_timeout) nit: if (FLAGS_accepted_cnxn_timeout != 0) { http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@261 PS3, Line 261: MILLIS_PER_SEC; long line. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 27 Feb 2019 22:26:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12624 ) Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc File be/src/rpc/impala-service-pool.cc: http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@119 PS1, Line 119: "The service queue is full; it has $3 items; memory " > Isn't the message "The service queue is full" misleading? It's full when either the max queue length or the memory consumption limit is reached. Definitely open to better suggestion for the message. I considered "The service queue has reached its capacity limit" but that doesn't seem any different. http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@120 PS1, Line 120: "consumption is $4", > Nit: needs terminating "." Done http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@125 PS1, Line 125: PrettyPrinter::Print(service_mem_tracker_->consumption(), TUnit::BYTES)); > There's a similar message in service-pool.cc As you mentioned, that's the Kudu's implementation. We forked it to allow us to enforce memory limit too. So, in the Kudu's implementation, the queue is only full because it has reached the maximum queue length. -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 27 Feb 2019 22:08:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Hello Andrew Sherman, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12624 to look at the new patch set (#2). Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() An incoming request to a RPC service can be rejected due to either exceeding the memory limit or maximum allowed queue length. It's unclear from the current error message which of those factors contributes to the failure as neither the actual queue length nor the memory consumption is printed. This patch fixes the problem by printing the estimated queue length and memory consumption when a RPC request is dropped. Testing done: verified the new error message with test_rpc_timeout.py Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h 2 files changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/2 -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12624 Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() An incoming request to a RPC service can be rejected due to either exceeding the memory limit or maximum allowed queue length. It's unclear from the current error message which of those factors contributes to the failure as neither the actual queue length nor the memory consumption is printed. This patch fixes the problem by printing the estimated queue length and memory consumption when a RPC request is dropped. Testing done: verified the new error message with test_rpc_timeout.py Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h 2 files changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/1 -- To view, visit http://gerrit.cloudera.org:8080/12624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98 Gerrit-Change-Number: 12624 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12623 Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() An incoming request to a RPC service can be rejected either due to exceeding the memory limit or maximum allowed queue length. It's unclear from the current error message which of those factors contributes to the failure as neither the actual queue length nor the memory consumption is printed. This patch fixes the problem by printing the estimated queue length and memory consumption when a RPC request is dropped. Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc --- M be/src/rpc/impala-service-pool.cc M be/src/rpc/impala-service-pool.h 2 files changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/12623/1 -- To view, visit http://gerrit.cloudera.org:8080/12623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc Gerrit-Change-Number: 12623 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12623 ) Change subject: IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy() .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/12623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc Gerrit-Change-Number: 12623 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4568: Cache parquet footer
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12622 ) Change subject: IMPALA-4568: Cache parquet footer .. Patch Set 1: Sorry, pushing the wrong branch. This is WIP. Definitely not meant for review. -- To view, visit http://gerrit.cloudera.org:8080/12622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78 Gerrit-Change-Number: 12622 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 27 Feb 2019 20:28:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4568: Cache parquet footer
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12622 ) Change subject: IMPALA-4568: Cache parquet footer .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/12622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78 Gerrit-Change-Number: 12622 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4568: Cache parquet footer
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12622 Change subject: IMPALA-4568: Cache parquet footer .. IMPALA-4568: Cache parquet footer Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-readers.cc A be/src/exec/parquet/parquet-footer-cache.cc A be/src/exec/parquet/parquet-footer-cache.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc 15 files changed, 443 insertions(+), 126 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/12622/1 -- To view, visit http://gerrit.cloudera.org:8080/12622 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78 Gerrit-Change-Number: 12622 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8251: Run test exchange deferred batches in dev builds only
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12610 Change subject: IMPALA-8251: Run test_exchange_deferred_batches in dev builds only .. IMPALA-8251: Run test_exchange_deferred_batches in dev builds only One of the stress related flag is not available in the development build. Testing done: Ran the test with release build and verified that it's skipped. Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5 --- M tests/custom_cluster/test_exchange_deferred_batches.py 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/12610/1 -- To view, visit http://gerrit.cloudera.org:8080/12610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5 Gerrit-Change-Number: 12610 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8251: Run test exchange deferred batches in dev builds only
Hello Thomas Marshall, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12610 to look at the new patch set (#2). Change subject: IMPALA-8251: Run test_exchange_deferred_batches in dev builds only .. IMPALA-8251: Run test_exchange_deferred_batches in dev builds only The startup flag stress_datastream_recvr_delay_ms is not available in development builds. Skip the test in non-developement builds. Testing done: Ran the test with release build and verified that it's skipped. Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5 --- M tests/custom_cluster/test_exchange_deferred_batches.py 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/12610/2 -- To view, visit http://gerrit.cloudera.org:8080/12610 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5 Gerrit-Change-Number: 12610 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h@137 PS2, Line 137: boost::lock_guard l(lock_); > Is that something we can rely on in all possible environments in which Impa Yes, I suppose it may not hold true for all architecture. It generally holds for x86-64 architecture. That said, Impala is quite x86 centric although there were works in the past to port it to Power architecture. Anyhow, this is not important unless we see any contention during perf testing. -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 26 Feb 2019 00:57:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12551 ) Change subject: [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc .. Patch Set 1: Sorry for the delay. +2'ed -- To view, visit http://gerrit.cloudera.org:8080/12551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ecae87938cc2ee9ce5a2bed639d3719f9bd793e Gerrit-Change-Number: 12551 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 26 Feb 2019 00:54:31 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12551 ) Change subject: [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12551 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4ecae87938cc2ee9ce5a2bed639d3719f9bd793e Gerrit-Change-Number: 12551 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 26 Feb 2019 00:54:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8138: Reintroduce rpc debugging options
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12297 ) Change subject: IMPALA-8138: Reintroduce rpc debugging options .. Patch Set 1: Any update ? -- To view, visit http://gerrit.cloudera.org:8080/12297 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941 Gerrit-Change-Number: 12297 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Mon, 25 Feb 2019 23:27:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 2: Any update ? -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 25 Feb 2019 23:26:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12567 ) Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/12567/2/tests/custom_cluster/test_exchange_deferred_batches.py File tests/custom_cluster/test_exchange_deferred_batches.py: http://gerrit.cloudera.org:8080/#/c/12567/2/tests/custom_cluster/test_exchange_deferred_batches.py@21 PS2, Line 21: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py File tests/custom_cluster/test_exchange_deferred_batches.py: http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py@21 PS3, Line 21: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py@24 PS3, Line 24: @ > flake8: E303 too many blank lines (2) Done -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Feb 2019 23:00:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12567 to look at the new patch set (#3). Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. IMPALA-8239: Fix handling of failed deserialization of row batch Previously, when a row batch failed to be deserialized in the data stream receiver, we will return the error status to the sender of the row batch without inserting the row batch. The receiver will continue to operate without flagging any error. The assumption is that the sender will eventually cancel the query upon receiving the failed status. Normally, when a caller of GetBatch() successfully dequeues a row batch from the batch queue, it will kick off the draining of the row batches from the deferred queue into the normal batch queue, which will further continue the cycle of draining the deferred queue upon the next call to GetBatch() until the deferred queue becomes empty. When an error is hit when deserializing a deferred batch to be inserted into the batch queue, the existing code will simply not insert the row batch or flag any error. This breaks the cycle of the deferred queue draining as the batch queue may become empty forever. The caller of GetBatch() will block indefinitely until the query is cancelled. The existing code works fine as the expectation is that the query will be cancelled once the sender receives the error status from the RPC response. However, this behavior is still not ideal as it lets a query which has hit a fatal error to hold on to resources for extended period of time. This patch fixes the problem by explicitly recording any error during row batch insertion in an error status object. Callers of GetBatch() will now also poll for this status object while waiting for row batch to show up and bail out early if there is any error. A new test case has been added to simulate the problematic case above. Change-Id: Iaa74937b046d95484887533be548249e96078755 --- M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/krpc-data-stream-recvr.h A tests/custom_cluster/test_exchange_deferred_batches.py 7 files changed, 157 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/3 -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12567 to look at the new patch set (#4). Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. IMPALA-8239: Fix handling of failed deserialization of row batch Previously, when a row batch failed to be deserialized in the data stream receiver, we will return the error status to the sender of the row batch without inserting the row batch. The receiver will continue to operate without flagging any error. The assumption is that the sender will eventually cancel the query upon receiving the failed status. Normally, when a caller of GetBatch() successfully dequeues a row batch from the batch queue, it will kick off the draining of the row batches from the deferred queue into the normal batch queue, which will further continue the cycle of draining the deferred queue upon the next call to GetBatch() until the deferred queue becomes empty. When an error is hit when deserializing a deferred batch to be inserted into the batch queue, the existing code will simply not insert the row batch or flag any error. This breaks the cycle of the deferred queue draining as the batch queue may become empty forever. The caller of GetBatch() will block indefinitely until the query is cancelled. The existing code works fine as the expectation is that the query will be cancelled once the sender receives the error status from the RPC response. However, this behavior is still not ideal as it lets a query which has hit a fatal error to hold on to resources for extended period of time. This patch fixes the problem by explicitly recording any error during row batch insertion in an error status object. Callers of GetBatch() will now also poll for this status object while waiting for row batch to show up and bail out early if there is any error. A new test case has been added to simulate the problematic case above. Change-Id: Iaa74937b046d95484887533be548249e96078755 --- M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/krpc-data-stream-recvr.h A tests/custom_cluster/test_exchange_deferred_batches.py 7 files changed, 157 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/4 -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12567 ) Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. Patch Set 2: Code-Review+1 Carry Lars' +1 -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Feb 2019 22:22:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12567 ) Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@118 PS1, Line 118: // accounting for various counters. > Maybe document that 'lock_' must be held via 'lock' here and below. Ok to i Done http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@161 PS1, Line 161: inesrting > inserting. Done http://gerrit.cloudera.org:8080/#/c/12567/1/tests/custom_cluster/test_exchange_deferred_batches.py File tests/custom_cluster/test_exchange_deferred_batches.py: http://gerrit.cloudera.org:8080/#/c/12567/1/tests/custom_cluster/test_exchange_deferred_batches.py@21 PS1, Line 21: class TestExchangeDeferredBatches(CustomClusterTestSuite): > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Feb 2019 22:22:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12567 to look at the new patch set (#2). Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. IMPALA-8239: Fix handling of failed deserialization of row batch Previously, when a row batch failed to be deserialized in the data stream receiver, we will return the error status to the sender of the row batch without inserting the row batch. The receiver will continue to operate without flagging any error. The assumption is that the sender will eventually cancel the query upon receiving the failed status. Normally, when a caller of GetBatch() successfully dequeues a row batch from the batch queue, it will kick off the draining of the row batches from the deferred queue into the normal batch queue, which will further continue the cycle of draining the deferred queue upon the next call to GetBatch() until the deferred queue becomes empty. When an error is hit when deserializing a deferred batch to be inserted into the batch queue, the existing code will simply not insert the row batch or flag any error. This breaks the cycle of the deferred queue draining as the batch queue may become empty forever. The caller of GetBatch() will block indefinitely until the query is cancelled. The existing code works fine as the expectation is that the query will be cancelled once the sender receives the error status from the RPC response. However, this behavior is still not ideal as it lets a query which has hit a fatal error to hold on to resources for extended period of time. This patch fixes the problem by explicitly recording any error during row batch insertion in an error status object. Callers of GetBatch() will now also poll for this status object while waiting for row batch to show up and bail out early if there is any error. A new test case has been added to simulate the problematic case above. Change-Id: Iaa74937b046d95484887533be548249e96078755 --- M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/krpc-data-stream-recvr.h A tests/custom_cluster/test_exchange_deferred_batches.py 7 files changed, 156 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/2 -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12567 ) Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@346 PS1, Line 346: status_.MergeStatus(status); > Should we signal data_arrival_cv_ here? Since the loop above depends on thi Good point. I think that's a more general implementation than what's implemented in this patch. -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 25 Feb 2019 19:45:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12567 Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch .. IMPALA-8239: Fix handling of failed deserialization of row batch Previously, when a row batch failed to be deserialized in the data stream receiver, we will return the error status to the sender of the row batch without inserting the row batch. The receiver will continue to operate without flagging any error. The assumption is that the sender will eventually cancel the query upon receiving the failed status. Normally, when a caller of GetBatch() successfully dequeues a row batch from the batch queue, it will kick off the draining of the row batches from the deferred queue into the normal batch queue, which will further continue the cycle of draining the deferred queue upon the next call to GetBatch() until the deferred queue becomes empty. When an error is hit when deserializing a deferred batch to be inserted into the batch queue, the existing code will simply not insert the row batch or flag any error. This breaks the cycle of the deferred queue draining as the batch queue may become empty forever. The caller of GetBatch() will block indefinitely until the query is cancelled. The existing code works fine as the expectation is that the query will be cancelled once the sender receives the error status from the RPC response. However, this behavior is still not ideal as it lets a query which has hit a fatal error to hold on to resources for extended period of time. This patch fixes the problem by explicitly recording any error during row batch insertion in an error status object. Callers of GetBatch() will now also poll for this status object while waiting for row batch to show up and bail out early if there is any error. A new test case has been added to simulate the problematic case above. Change-Id: Iaa74937b046d95484887533be548249e96078755 --- M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/krpc-data-stream-recvr.h A tests/custom_cluster/test_exchange_deferred_batches.py 7 files changed, 139 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/1 -- To view, visit http://gerrit.cloudera.org:8080/12567 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755 Gerrit-Change-Number: 12567 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8212 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()
Michael Ho has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12553 ) Change subject: IMPALA-8212 / KUDU-2706: Work around the lack of thread safety in krb5_parse_name() .. Removed reviewer Kudu Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/12553 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe Gerrit-Change-Number: 12553 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()
Hello Alexey Serbin, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/12553 to review the following change. Change subject: IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5_parse_name() .. IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5_parse_name() krb5_init_context() sets the field 'default_realm' in a krb5_context object to 0. Upon first call to krb5_parse_name() with a principal without realm specified (e.g. foo/bar), 'default_realm' in the krb5_context object is lazily initialized. When more than one negotiation threads are configured, it's possible for multiple threads to call CanonicalizeKrb5Principal() in parallel. CanonicalizeKrb5Principal() in turn calls krb5_parse_name(g_krb5_ctx, ...) with no lock held. In addition, krb5_parse_name() is not thread safe as it lazily initializes 'context->default_realm' without holding lock. Consequently, 'g_krb5_ctx' which is shared and not supposed to be modified after initialization may be inadvertently modified concurrently by multiple threads, leading to crashes (e.g. double free) or errors. This change works around the problem by initializing 'g_krb5_ctx->default_realm' once in InitKrb5Ctx() by calling krb5_get_default_realm(). Impala internally calls InitAuth() which calls InitKerberosForServer() which ends up calling InitKrb5Ctx() in a single threaded environment. Change-Id: I1bf9224516e2996f51f319088179727f76741ebe Reviewed-on: http://gerrit.cloudera.org:8080/12545 Reviewed-by: Alexey Serbin Tested-by: Kudu Jenkins --- M be/src/kudu/security/init.cc 1 file changed, 12 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/12553/1 -- To view, visit http://gerrit.cloudera.org:8080/12553 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe Gerrit-Change-Number: 12553 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[Impala-ASF-CR] IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()
Michael Ho has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/12553 ) Change subject: IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5_parse_name() .. Removed reviewer Kudu Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/12553 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe Gerrit-Change-Number: 12553 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alexey Serbin
[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12544 ) Change subject: IMPALA-8235: avoid TIME_MS in runtime profile .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc@1108 PS1, Line 1108: static_cast( > Here's a minimal example of the problem: https://godbolt.org/z/cfFzvs I see. Thanks for the explanation. -- To view, visit http://gerrit.cloudera.org:8080/12544 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53 Gerrit-Change-Number: 12544 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Feb 2019 02:33:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12544 ) Change subject: IMPALA-8235: avoid TIME_MS in runtime profile .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12544 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53 Gerrit-Change-Number: 12544 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Feb 2019 02:33:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12544 ) Change subject: IMPALA-8235: avoid TIME_MS in runtime profile .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG@10 PS1, Line 10: areare are http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG@13 PS1, Line 13: Work around the issue by using the older TIME_NS type for this : counter. What's the plan going forward ? Do we need to identify a set of interfaces which need to be backward compatible ? http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc@1108 PS1, Line 1108: static_cast( Why is this cast necessary ? Isn't time_since_update_ms already int64_t ? -- To view, visit http://gerrit.cloudera.org:8080/12544 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53 Gerrit-Change-Number: 12544 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 21 Feb 2019 01:58:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12451 ) Change subject: IMPALA-8187: UDF samples hide symbols by default .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28 Gerrit-Change-Number: 12451 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 13 Feb 2019 18:03:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12451 ) Change subject: IMPALA-8187: UDF samples hide symbols by default .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt File be/src/udf_samples/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt@27 PS3, Line 27: set(IR_COMPILE_FLAGS "-emit-llvm" "-O3" "-std=c++14" "-c" "-I../" ${CLANG_BASE_FLAGS}) > Do we care about this case as well ? Actually, since it's emitting IR, it's irrelevant. -- To view, visit http://gerrit.cloudera.org:8080/12451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28 Gerrit-Change-Number: 12451 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 13 Feb 2019 00:23:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12451 ) Change subject: IMPALA-8187: UDF samples hide symbols by default .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28 Gerrit-Change-Number: 12451 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12451 ) Change subject: IMPALA-8187: UDF samples hide symbols by default .. Patch Set 3: Zoram, can you please also take a look ? -- To view, visit http://gerrit.cloudera.org:8080/12451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28 Gerrit-Change-Number: 12451 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12451 ) Change subject: IMPALA-8187: UDF samples hide symbols by default .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/testutil/test-udas.cc File be/src/testutil/test-udas.cc: http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/testutil/test-udas.cc@343 PS3, Line 343: } nit: Please keep a blank line between functions http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt File be/src/udf_samples/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt@27 PS3, Line 27: set(IR_COMPILE_FLAGS "-emit-llvm" "-O3" "-std=c++14" "-c" "-I../" ${CLANG_BASE_FLAGS}) Do we care about this case as well ? -- To view, visit http://gerrit.cloudera.org:8080/12451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28 Gerrit-Change-Number: 12451 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 ) Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness .. Patch Set 1: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@12 PS1, Line 12: 'report_status_retry_interval_ms' Please also mention that this flag is removed after IMPALA-4555. http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@19 PS1, Line 19: Now, we wait 'status_report_interval_ms'. By default, this is 5000ms, between retries -- To view, visit http://gerrit.cloudera.org:8080/12461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05 Gerrit-Change-Number: 12461 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 12 Feb 2019 23:15:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12420 ) Change subject: Add support for compiling using OpenSSL 1.1 .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76 PS1, Line 76: #else : // OpenSSL 1.1+ doesn't let us detect the supported TLS version at runtime. Assume : // that the OpenSSL library we're linked against supports only up to TLS1.2 : > I guess OpenSSL is expecting that callers will pass in the version they wan Fair enough. Agree that if we compile against OpenSSL 1.1, we would expect the OpenSSL version which can be linked against Impala at runtime should support TLS 1.2. -- To view, visit http://gerrit.cloudera.org:8080/12420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264 Gerrit-Change-Number: 12420 Gerrit-PatchSet: 3 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: hector.aco...@cloudera.com Gerrit-Comment-Date: Tue, 12 Feb 2019 18:33:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12420 ) Change subject: Add support for compiling using OpenSSL 1.1 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76 PS1, Line 76: #else : return TLS1_2_VERSION; : #endif : } > I'm not sure I follow. Sorry, I wasn't clear enough. Why cannot you do the following ? #if OPENSSL_VERSION_NUMBER < 0x1010L return SSLv23_method()->version; #else return TLS_method()->version; #endif -- To view, visit http://gerrit.cloudera.org:8080/12420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264 Gerrit-Change-Number: 12420 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: hector.aco...@cloudera.com Gerrit-Comment-Date: Tue, 12 Feb 2019 01:30:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12420 ) Change subject: Add support for compiling using OpenSSL 1.1 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76 PS1, Line 76: #else : return TLS1_2_VERSION; : #endif : } > SSLv23_method() still works fine, but the returned object is now opaque. May be replacing it with TLS_method() instead According to https://www.openssl.org/docs/man1.1.0/man3/SSLv23_method.html: Use of these functions is deprecated. They have been replaced with the above TLS_method(), TLS_server_method() and TLS_client_method() respectively. New code should use those functions instead. -- To view, visit http://gerrit.cloudera.org:8080/12420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264 Gerrit-Change-Number: 12420 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: hector.aco...@cloudera.com Gerrit-Comment-Date: Tue, 12 Feb 2019 01:08:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12420 ) Change subject: Add support for compiling using OpenSSL 1.1 .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76 PS1, Line 76: #else : return TLS1_2_VERSION; : #endif : } Why this change here ? Is there anything wrong with keeping the SSLv23_method()->version? Is it not defined or something ? There seems to be some complication with the logic on RHEL related to detecting the version of the linked OpenSSL library: (https://github.com/apache/impala/commit/a3c0fffa1275c1d1d2770b9fc35885a22dc7edce) http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@123 PS1, Line 123: EVP_CIPHER_CTX * nit: EVP_CIPHER_CTX* ctx; -- To view, visit http://gerrit.cloudera.org:8080/12420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264 Gerrit-Change-Number: 12420 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 11 Feb 2019 22:39:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 08 Feb 2019 22:02:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8154: Disable Kerberos auth to local setting
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12405 ) Change subject: IMPALA-8154: Disable Kerberos auth_to_local setting .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12405/1/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/12405/1/be/src/rpc/authentication.cc@791 PS1, Line 791: FLAGS_use_system_auth_to_local = false; An alternative would be to change the default value at the definition of this flag. That said, we never really supported auth_to_local mapping and honestly speaking, don't see the need for it for internal communication between Impalads. -- To view, visit http://gerrit.cloudera.org:8080/12405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8 Gerrit-Change-Number: 12405 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 08 Feb 2019 18:27:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12388 ) Change subject: IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12388 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I92185b456623e08841bc27b5dbbe09ace99294aa Gerrit-Change-Number: 12388 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 08 Feb 2019 07:45:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8064: Improve observability of wait times for runtime filters
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12401 ) Change subject: IMPALA-8064: Improve observability of wait times for runtime filters .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG@10 PS1, Line 10: Previously, the wait time was calculated with : respect to the ScanNode::Init() function while duration was logged : with respect to the ScanNode::WaitForRuntimeFilters() function. The filter wait time counts against the elapsed time since the filter's registration in ScanNode::Init() while duration logged in ScanNode::WaitForRuntimeFilters() is the time spent in the function waiting for all filters to arrive. This could be misleading as it doesn't account for the elapsed time spent between ScanNode::Init() and ScanNode::WaitForRuntimeFilters(). http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG@18 PS1, Line 18: to http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/exec/scan-node.cc@179 PS1, Line 179: max_wait_time = max(max_wait_time, ctx.filter->time_elapsed()); If I understand it correctly, the point of these changes is to reflect the fact that "FLAGS_runtime_filter_wait_time_ms" is essentially the maximum tolerable elapsed time between the filter's registration and arrival time, which potentially is longer than the elapsed time spent in this function. While I agree it's useful to show the maximum elapsed time between registration and arrival for all filters, it seems that the original wait time here is also useful for helping us understand the actual wall clock time spent in this function waiting for filters to arrive. The time spent in this function directly adds to the delay on issuing the initial scan ranges and thus how fast we return rows. So, what do you think about keeping the original wait time while calling the max elapsed time between registration and arrival "max arrival delay". http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h File be/src/runtime/runtime-filter.h: http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@79 PS1, Line 79: /// Returns the amount of time waited since registration for the filter to , in ms, http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@81 PS1, Line 81: int32_t Either set this to int64_t or convert time_elapsed() below to return int32_t. I suppose int32_t can tolerate delays up to 24 days. http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@86 PS1, Line 86: time in millisecond -- To view, visit http://gerrit.cloudera.org:8080/12401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28fd45e75c773bc01d424f5a179ae186ee9b7469 Gerrit-Change-Number: 12401 Gerrit-PatchSet: 1 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Pooja Nilangekar Gerrit-Comment-Date: Fri, 08 Feb 2019 07:37:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8154: Disable Kerberos auth to local setting
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12405 Change subject: IMPALA-8154: Disable Kerberos auth_to_local setting .. IMPALA-8154: Disable Kerberos auth_to_local setting Before KRPC, the local name mapping was done from the principal name entirely. With KRPC, Impala started to use the system auth_to_local rules as the Kudu security code has "--use_system_auth_to_local=true" by default. This can cause regression if local auth is configured in the krb5.conf (e.g. with SSSD with AD) as we started enforcing authorization based on Kerberos principal after this commit (https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d) This change fixes the problem by explicitly setting FLAGS_use_system_auth_to_local to false during initialization. Testing done: Enabled auth_to_local in a Kerberized cluster to map "impala/" to foobar and verified queries still worked as expected. Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8 --- M be/src/rpc/authentication.cc 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/12405/1 -- To view, visit http://gerrit.cloudera.org:8080/12405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8 Gerrit-Change-Number: 12405 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho
[Impala-ASF-CR] IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12388 ) Change subject: IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS .. Patch Set 2: Will do. -- To view, visit http://gerrit.cloudera.org:8080/12388 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I92185b456623e08841bc27b5dbbe09ace99294aa Gerrit-Change-Number: 12388 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 07 Feb 2019 21:34:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 07 Feb 2019 02:46:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py File tests/custom_cluster/test_rpc_timeout.py: http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py@158 PS5, Line 158: # Since the sleep time (1000ms) is much longer than the rpc timeout (100ms), all : # reports will appear to fail. The query is designed to result in many intermediate : # status reports but fewer than the max allowed failures, so the query should succeed. : query_options = {'debug_action': 'REPORT_EXEC_STATUS_DELAY:SLEEP@1000'} > Sure, happy to write such a test if you think there's additional value in i Thanks for pointing out the insert errors below. LGTM. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 07 Feb 2019 02:46:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 7: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@695 PS6, Line 695: This may be because the port specified is a thrift port, which" : " :shutdown() can no longer use. > I'm not sure what you mean here. Yes we can't be sure that the reason we ca "Using thrift port" is a subset of "using the wrong port". The case of "using thrift port" is true iff port == FLAGS_be_port which apparently isn't part of the if-stmt here so it's not precise to say so in the error message. Anyhow, it's just a minor point so it's not worth dwelling on it. Please feel free to ignore the suggestion. http://gerrit.cloudera.org:8080/#/c/12260/7/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/7/be/src/service/client-request-state.cc@691 PS7, Line 691: request.__isset.backend && request.backend.port nit: factor this out common expression as it's used on line 639 above -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Thu, 07 Feb 2019 02:35:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 ) Change subject: IMPALA-4555: Make QueryState's status reporting more robust .. Patch Set 5: Code-Review+2 (7 comments) LGTM. Please address the remaining comments. http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc@259 PS5, Line 259: instance_status->set_report_seq_no(report_seq_no_); Can you please a comment on why we don't advance the seq number here to help readers understand it ? http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc@300 PS5, Line 300: on in typo ? http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@421 PS5, Line 421: /// Returns the amount of time to wait before sending the next status report, calculated in milliseconds http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@422 PS5, Line 422: exponential This isn't correct, right ? http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@424 PS5, Line 424: int64_t GetReportWaitTimeMs(); int64_t GetReportWaitTimeMs() const; http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.cc@415 PS5, Line 415: FLAGS_status_report_interval_ms A minor quirk I just noticed. If the user sets FLAGS_status_report_interval_ms to 0, we may want to have a minimum sleep time. Otherwise, if we failed to send the last report, we may retry without sleeping. I guess my earlier suggestion to use the same timeout may have led to this quirk but the current timeout logic still seems simpler overall. http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py File tests/custom_cluster/test_rpc_timeout.py: http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py@158 PS5, Line 158: # Since the sleep time (1000ms) is much longer than the rpc timeout (100ms), all : # reports will appear to fail. The query is designed to result in many intermediate : # status reports but fewer than the max allowed failures, so the query should succeed. : query_options = {'debug_action': 'REPORT_EXEC_STATUS_DELAY:SLEEP@1000'} This is a good test. Can you please add a test which exercises the stateful reporting part by running some UDFs which may generate errors probabilistically. (e.g. as a scan predicate which calls the UDF). In this case, we will also start accumulating errors in the fragment instance state. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 05 Feb 2019 23:11:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/runtime/coordinator-backend-state.cc@445 PS6, Line 445: 10.0L 10 should be sufficient. It will be implicitly casted to double. http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@692 PS6, Line 692: msg.find("RemoteShutdown() RPC failed: Timed out: connection negotiation to") : != string::npos Do you think it makes sense to spill out this message iff "request.__isset.backend && request.backend.port != 0" ? http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@695 PS6, Line 695: This may be because the port specified is a thrift port, which" : " :shutdown() can no longer use. Will it be more precise to say as we didn't match the port specified (if any) is actually a thrift port number, right ? "This may be because the port specified is wrong." http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h@71 PS6, Line 71: class ? http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h@72 PS6, Line 72: class ? http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@84 PS4, Line 84: MonoDelta krpc_timeout > My experience in the Java world with TimeUnit is that using a type like Mon Fair point. The coding convention in Impala code (e.g. see StartShutdown()) tends to be that we specific time with a suffix of the time unit instead of passing MonoDelta object around. It seems more consistent with the existing code to use timeout_ms or timeout_s instead. http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@92 PS4, Line 92: if (!rpc_status.ok()) continue; > I added the TODO, thanks. I didn't add the actual sleep as that was what we Sure, please address those questions in IMPALA-8143. http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@170 PS3, Line 170: FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES); > OK, how about I promise to do this in IMPALA-8143? I can see some other pro Please add a TODO IMPALA-8143 as a reminder. http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.cc@98 PS6, Line 98: kudu::rpc::RpcContext Not your change but can you please replace "kudu::rpc::RpcContext" to RpcContext given the "use kudu::rpc::RpcContext" above ? -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 05 Feb 2019 19:40:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7565: Set TAcceptQueueServer connection setup pool to be multi-threaded by default
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12249 ) Change subject: IMPALA-7565: Set TAcceptQueueServer connection_setup_pool to be multi-threaded by default .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I053120d4c3153ddbe5261acd28388be6cd191908 Gerrit-Change-Number: 12249 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Sat, 02 Feb 2019 01:32:22 + Gerrit-HasComments: No