[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#4).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
18 files changed, 1,706 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/4
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#3).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
18 files changed, 1,706 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/3
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@88
PS2, Line 88: consolidatin
> consolidating
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@89
PS2, Line 89: sparse
> sparse
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@166
PS2, Line 166:
 :   /// Utility function to verify that all partitions' 
consumption don't exceed their
 :   /// quotas. Retur
> clang-tidy failure: struct instead of class.
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@184
PS2, Line 184:
> empty
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@296
PS2, Line 296:   meta_cache_->Erase(key);
 :   return true;
 : }
 :   }
> TODO: Add a metric for this.
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@343
PS2, Line 343:
> insertion_offset + bytes_written
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@117
PS2, Line 117:
> flake8: E703 statement ends with a semicolon
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@120
PS2, Line 120:
> flake8: E703 statement ends with a semicolon
Done



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 19 Apr 2019 02:20:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY

2019-04-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13060 )

Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc@154
PS1, Line 154: DEFINE_string(debug_actions, "", "For testing only. Uses the 
same format as the debug "
 : "action query options, but allows for injection of debug 
actions in code paths where "
 : "query options are not available.");
> I could see the argument either way.
I think this is fine as-is. No point in dwelling on it for too much as long as 
it achieves the purpose.



--
To view, visit http://gerrit.cloudera.org:8080/13060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Gerrit-Change-Number: 13060
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 22:26:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8414: Skip header when parsing /proc/net/dev

2019-04-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13016 )

Change subject: IMPALA-8414: Skip header when parsing /proc/net/dev
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13016/1/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/13016/1/be/src/util/system-state-info.cc@174
PS1, Line 174: const StringPiece
const StringPiece&



--
To view, visit http://gerrit.cloudera.org:8080/13016
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ff931671050d44926d0baa77ec374afed1f8225
Gerrit-Change-Number: 13016
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 17 Apr 2019 23:09:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@343
PS2, Line 343: insertion_offset));
insertion_offset + bytes_written



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 18:35:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@147
PS1, Line 147:   if (charge_len > capacity_) return false;
> Given that we won't actually insert into the cache until after trying to wr
This actually seems to be a problematic behavior. We may temporarily exceed the 
capacity limit as a result of this.



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 05:57:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@296
PS2, Line 296: if (pending_insert_set_.size() >= 
FLAGS_data_cache_write_concurrency ||
 : pending_insert_set_.find(key.ToString()) != 
pending_insert_set_.end()) {
 :   return false;
 : }
TODO: Add a metric for this.



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 16 Apr 2019 19:47:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 2:

TODO:

- add a test to rotate the files in data-cache-test.cc
- add a test for the changes in filesystem-util.cc


--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 16 Apr 2019 19:46:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] XXX

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/13026 )

Change subject: XXX
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/13026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ia3ede43146006f7e76707f1b7c8c99aae6671db8
Gerrit-Change-Number: 13026
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-16 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#2).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
17 files changed, 1,467 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/2
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 1:

(71 comments)

This new patch changes the configuration string to use a uniform capacity quota 
for all directories.

http://gerrit.cloudera.org:8080/#/c/12987/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12987/1//COMMIT_MSG@24
PS1, Line 24: is a tuple of (file's name, file's modification time, file offset)
> So this means that you can get a partial cache hit if the offsets match but
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/exec/hdfs-scan-node-base.cc@948
PS1, Line 948: 
data_cache_miss_bytes_->Set(reader_context_->data_cache_miss_bytes());
> I think for the above counters we figured that this pattern of copying the
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@52
PS1, Line 52:   // The callback is invoked by each thread in the multi-threaded 
tests below.
> callback reads like it's called when something is done, how about ThreadFn?
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@157
PS1, Line 157:   // Read the same same range inserted previously and they 
should still all in the cache.
> nit: be
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@196
PS1, Line 196:   // Create a buffer way larger than the cache.
> Why does it need to be larger?
Typo. Fixed.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@201
PS1, Line 201:   ASSERT_TRUE(cache.Store("foobar", 12345, 0, 
large_buffer.get(), cache_size));
> Use variables instead of inline constants?
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@220
PS1, Line 220:   const int64_t cache_size = 1 << 22;
> I find these easier to read in the form of 4 * 1024 * 1024
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@248
PS1, Line 248: differen
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@301
PS1, Line 301: ootprint) {
> Can you add a comment what this test does?
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@330
PS1, Line 330: int main(int argc, char **argv) {
> This is not needed anymore if you add your test to the unified backend test
Prefer to keep this test as stand-alone to make development easier.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@18
PS1, Line 18: #ifndef IMPALA_RUNTIME_IO_DATA_CACHE_H
> Could use #pragma once instead of include guards
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@53
PS1, Line 53: /// punching holes in the backing file it's stored in. As a 
backing file reaches a certain
> It's actually a TODO item to retire older files and copy what's left in the
Added a check for filesystem support for hole punching in the new patch.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@55
PS1, Line 55: /// created instead. Note that due to hole punching, the backing 
file is actually sparse.
> This might be a case where a simple ASCII diagram would illustrate the conc
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@56
PS1, Line 56: ///
> It's by design that we don't support overlapping range for the first implem
Added some explanation in the header comment.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@63
PS1, Line 63: /// happens inline currently during Store().
> It's worth documenting the PAGE_SIZE behaviour since it implies that there'
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@76
PS1, Line 76: class DataCache{
> style nit: missing space before {
Done


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@81
PS1, Line 81:   DataCache(const std::string& config) : config_(config) { }
> totally doesn't matter here, but general best practice in C++11 is to have
Thanks for the reminder. Almost every time, I have to look up in the internet 
for the advantage of pass-by-value-then-move over pass-by-reference. It's 
subtle but it makes sense. We should probably start sticking to this idiom more 
widely in Impala code base. May be a clang-tidy rule will help ?!

Also totally irrelevant but that's an area where I find passing by pointer in C 
is sometimes easier to use or understand than C++.



[Impala-ASF-CR] IMPALA-8138: Reintroduce rpc debugging options

2019-04-16 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12297 )

Change subject: IMPALA-8138: Reintroduce rpc debugging options
..


Patch Set 1:

(6 comments)

Thanks for working on this patch. My general comment is to consider moving away 
from the existing "fake" fault injection framework in Thrift and use debug 
actions to simulate scenarios in which we may actually fail to exercise the 
entire RPC stack better.

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h
File be/src/rpc/impala-control-service-proxy.h:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h@44
PS5, Line 44:
As mentioned elsewhere, this kind of artificial fault injection doesn't seem to 
be too useful.


http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h
File be/src/rpc/impala-control-service-proxy.h:

http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h@43
PS3, Line 43:
Not needed. Same below.


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h@65
PS5, Line 65:
:
:
:
:
:
:
:
:
:
:
:
:
It appears that this send vs recv debug actions were carried over from Thrift 
implementation.

Retrospectively, the "fault injection" we did with Thrift was quite hacky (I am 
the culprit here) and it stemmed from the total lack of  fault injection 
testing back then for exercising the error paths in Thrift RPC.

As part of the KRPC development, we invested in proper fault injection testing 
by truly pausing the Impala and artificially creates various failure scenario. 
This allows a more extensive exercise across the entire RPC stack instead of 
just exercising the RPC handlers at the client and server sides.

With the fault injection framework, it seems to be not too useful to continue 
with this path of artificial fault injection via debug action we used to do 
with Thrift.

Instead, we may want to rethink the fault injection testing with KRPC. In 
particular, it may exercise the code better by doing some of the followings:

- use debug actions to inject random delays in the RPC handlers. This is 
particularly useful for RPCs with timeout

- use debug actions to randomly reject some of the incoming RPCs in 
ImpalaServicePool

- use debug actions to respond with error status in the RPC handlers. The 
errors will be specific to each RPC handler (e.g. deserialization error of 
Thrift profiles, deserialization errors of RowBatch)

- debug action to force some incoming RPCs to use deferred queue in 
KrpcDataStreamRecvr

- (experimental / dangerous) "randomly" corrupt the incoming RPC payloads  in 
ImpalaServicePool.

- inject delay in RPC callback in the client side to simulate an overloaded 
client

The above are some examples I can think of right now.

For other failures, we may need to rely on the fault injection framework:

- use iptables to drop all incoming packets to the RPC port from a particular 
host. This simulates a host which was powered off or network partitions

- Restart remote Impalad will trigger the behavior of broken connections (by 
sending a RST packet)

- Send SIGSTOP to remote Impalad (which we already do in the fault injection 
framework) to simulate non-responsive Impalad

- other ideas...

In general, my suggestion is to use debug actions to simulate failure which can 
actually happen instead of using this artificial fault injection which seems a 
bit meaningless at this point.


http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h@314
PS1, Line 314: proxy_
> I assume that if I change the class name back to something ending in "Proxy
Yup.


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@171
PS5, Line 171:   if (qs.get() == nullptr) {
 : Status status(ErrorMsg(TErrorCode::INTERNAL_ERROR,
Should this be converted to debug action ?


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@186
PS5, Line 186:
 :
Should this be converted to debug action ?



--
To view, visit http://gerrit.cloudera.org:8080/12297
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941
Gerrit-Change-Number: 12297
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew 

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 1:

(3 comments)

> Another option would be to do a global hashtable and have the
 > entries directly identify their partitions (in fact I think they
 > already do via the CacheFile pointer, right?). Eviction still needs
 > to be per-partition based on that partition's capacity, but
 > insertion can try to more smartly allocate across drives.
 >

Yes, I had a similar question when thinking over the patch over the weekend. I 
tried the approach of weighted sampling + a global LRU cache but stopped short 
of coming up with a way to maintain the quota per partition with a global LRU 
list so I abandoned it and kind of punted on this problem for now. May be it's 
okay to do LRU per partition. Should have documented my thought there in the 
code.

I think the _IO_ apportioned relative to capacity may be fine as that's a 
reasonable expectation if a user specifies more capacity on a slower device. 
One thing to consider may be to use multiple backing files for larger partition 
to avoid per-file lock problem.

 > Last thought: since there are some tricky design decisions above,
 > maybe we can simplify this and just support a single partition for
 > now, and come back to supporting multi-partition?

It may be reasonable to punt on it for now. Let me see how much work it is to 
switch over to the global hash-table + per-partition LRU approach.

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@53
PS1, Line 53: /// punching holes in the backing file it's stored in. As a 
backing file reaches a certain
> This does mean that the number of backing files could grown unbounded right
It's actually a TODO item to retire older files and copy what's left in them to 
the current file. I didn't get around to doing that in this first version.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.h@56
PS1, Line 56: ///
> We should mention in the comment how lookup works, in particular what happe
It's by design that we don't support overlapping range for the first 
implementation. It seems to fare pretty well with the TPC-DS workload and 
parquet file format we are using.

One of the TODO in the future is to measure the overlapping case and consider 
handling them if they are actually common.


http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@79
PS1, Line 79:   // Unlink the file so the disk space is recycled once the 
process exits.
> I'm not sure how I feel about this - it could cause a lot of confusion havi
Thanks for the feedback. I also considered the approach of cleaning up the 
directory on startup but this seems to make certain assumption about whether 
the directory specified by the user for caching is exclusively used by the 
Impalad process.

While this is not a common scenario right now, I can see there may be 
configuration in the future where one may run multiple Impala containers on the 
same host and these containers may belong to the same cluster. So, wiping out a 
particular directory on restart isn't exactly safe. Another alternative would 
be to include the hostname / IP address + port number as a unique name for the 
file so we will automatically truncate the file on restart. However, there is 
no guarantee that an Impalad will restart with the same IP address / hostname 
after a crash, esp. in a public cloud setting.

That said, one can also argue that this is a configuration issue and those 
Impala containers shouldn't be sharing directories etc. The feedback so far 
suggests that the "hidden" disk usage seems even more undesirable so may be 
wiping out the directory at startup is not as bad and we can just forbid 
sharing of caching directories by multiple Impalads on the same host.



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 11 Apr 2019 17:13:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-04-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 9:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@134
PS9, Line 134: get_current_queue_size
Coding style suggests this should be "GetCurrentQueueSize()". With that said, 
it may not be needed after all if you take the suggsetion in rpc-mgr-test.cc.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.h@295
PS9, Line 295: kudu::Status::IOError(
 : Substitute("ping failing, number of calls=$0.", 
number_of_calls_));
Instead of IOError, can we simulate the server busy error here ? Please see 
comments in rpc-mgr-test.cc for reason.


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@269
PS9, Line 269:  until
typo


http://gerrit.cloudera.org:8080/#/c/12672/9/be/src/rpc/rpc-mgr-test.cc@285
PS9, Line 285:   // Find the counter which tracks the number of times the 
service queue is too full.
 :   const string& overflow_count = Substitute(
 :   ImpalaServicePool::RPC_QUEUE_OVERFLOW_METRIC_KEY, 
ping_service->service_name());
 :   IntCounter* overflow_metric =
 :   
ExecEnv::GetInstance()->rpc_metrics()->FindMetricForTesting(
 :   overflow_count);
 :   ASSERT_TRUE(overflow_metric != nullptr);
 :   // There has been no activity yet so this should be 0.
 :   EXPECT_EQ(overflow_metric->GetValue(), 0L);
 :
 :   // Make a proxy for Ping rpc calls.
 :   unique_ptr proxy;
 :   ASSERT_OK(ping_service->GetProxy(krpc_address_, 
FLAGS_hostname, ));
 :   TQueryCtx query_ctx;
 :
 :   // Do an async call that will run in the service's single 
thread.
 :   RpcController async_controller;
 :   CountingBarrier async1_done(1);
 :   PingRequestPB request1;
 :   PingResponsePB response1;
 :   ResponseCallback async1_callback = [&]() { 
async1_done.Notify(); };
 :   proxy->PingAsync(request1, , _controller, 
async1_callback);
 :
 :   // wait for the async call to be running.
 :   bool timed_out = false;
 :   call1_started.Wait(10 * MILLIS_PER_SEC, _out);
 :   ASSERT_FALSE(timed_out);
 :   ASSERT_EQ(0, get_current_queue_size(rpc_mgr_));
 :   // Now the async call is running in the service.
 :
 :   // Do a second async call, this will set in the queue, which 
will be full.
 :   RpcController async_controller2;
 :   CountingBarrier async2_done(1);
 :   // The second call only has a short sleep.
 :   PingRequestPB request2;
 :   PingResponsePB response2;
 :   // Set the deadline to something reasonable otherwise the 
pings from DoRpcWithRetry
 :   // will replace this async call in the queue.
 :   MonoTime deadline = MonoTime::Now() + 
MonoDelta::FromSeconds(10);
 :   async_controller2.set_deadline(deadline);
 :   ResponseCallback async2_callback = [&]() { 
async2_done.Notify(); };
 :   proxy->PingAsync(request2, , _controller2, 
async2_callback);
 :
 :   // Wait for the second async call to get queued.
 :   int num_sleeps = 20;
 :   while (get_current_queue_size(rpc_mgr_) != 1) {
 : SleepForMs(300);
 : if (--num_sleeps < 0) {
 :   FAIL() << "Waited too long for async message to be queued";
 : }
 :   }
 :   // Now the second async call is queued, and the queue is full.
 :
 :   // Assert that there have been no overflows yet.
 :   EXPECT_EQ(overflow_metric->GetValue(), 0L);
 :
 :   // Call DoRpcWithRetry with retry on our busy service.
 :   // This won't succeed, but we can observe that retries 
occurred.
 :   const int retries = 3;
 :   PingRequestPB request3;
 :   PingResponsePB response3;
 :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, 
::Ping,
 :   request3, , query_ctx, "ping failed", retries, 
100 * MILLIS_PER_SEC);
 :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
 :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
 :   "dropped due to backpressure. The service queue contains 1 
items out 

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

2019-04-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in 
ThreadTokenAvailableCb
..


Patch Set 2: Code-Review+1

(4 comments)

I'll leave it to Lars or other reviewers to do the +2.

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@9
PS2, Line 9: and
nit: an ?


http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@11
PS2, Line 11: cascade
: of blocked threads
Makes me wonder if there are actually other similar problematic sequence in the 
HdfsScanNode code ?!


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h@117
PS2, Line 117:   /// together, this lock must be taken first.
Probably worth documenting briefly why this is a "timed_mutex" instead of a 
plain "mutex".


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc@292
PS2, Line 292: The lock can be held for significant periods of time if the scan
 : // node is being cancelled.
Would it be clearer to say another thread could be holding this lock for an 
extended period of time. When I first read it, I kind of interpret it as this 
lock is held by _this_ function for a long time ?

Also, may also help to call out that this function may be called under some 
other locks so that's why the remedies are being done here to break out of the 
loop earlier if the scan node is being cancelled.



--
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 10 Apr 2019 22:25:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7981: Add host disk statistics to profile

2019-04-10 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12822 )

Change subject: IMPALA-7981: Add host disk statistics to profile
..


Patch Set 6: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@57
PS6, Line 57: num_values
nit: expected_num_values


http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@110
PS6, Line 110:
 :
nit: unnecessary blank lines.


http://gerrit.cloudera.org:8080/#/c/12822/6/be/src/util/system-state-info.cc@129
PS6, Line 129: stat_string
Does it make sense to DCHECK(StringPiece(stat_string).starts_with('cpu  ')); ?



--
To view, visit http://gerrit.cloudera.org:8080/12822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57
Gerrit-Change-Number: 12822
Gerrit-PatchSet: 6
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 10 Apr 2019 19:10:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-10 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12987


Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of :
pairs, separated by comma. An example configuration string:
  --data_cache_config=/data/0:150GB,/data/1:150GB

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum).

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
14 files changed, 1,115 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/1
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8395: Parse older formats of /proc/net/dev correctly

2019-04-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12954 )

Change subject: IMPALA-8395: Parse older formats of /proc/net/dev correctly
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic804955d8e4269e787037a6dc68bef2d70382426
Gerrit-Change-Number: 12954
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 09 Apr 2019 02:05:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7981: Add host disk statistics to profile

2019-04-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12822 )

Change subject: IMPALA-7981: Add host disk statistics to profile
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12822/4/be/src/util/system-state-info.h
File be/src/util/system-state-info.h:

http://gerrit.cloudera.org:8080/#/c/12822/4/be/src/util/system-state-info.h@255
PS4, Line 255:   for (int i = 0; i < num_values; ++i) {
 : int64_t v = -1;
 : (*ss) >> v;
Do you plan to choose 
https://gerrit.cloudera.org/#/c/12954/2/be/src/util/system-state-info.cc@169 
over the current implementation ?


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h
File be/src/util/system-state-info.h:

http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@183
PS2, Line 183:   /// The enum names correspond to the fields of /proc/diskstats
 :   /// https://www.kernel.org/doc/Documentation/iostats.txt
> These are the fields present in 2.6, while 4.8 added more fields that we're
Yes, I think the link to the doc should be sufficient.



--
To view, visit http://gerrit.cloudera.org:8080/12822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57
Gerrit-Change-Number: 12822
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 08 Apr 2019 20:56:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8395: Parse older formats of /proc/net/dev correctly

2019-04-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12954 )

Change subject: IMPALA-8395: Parse older formats of /proc/net/dev correctly
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc@168
PS2, Line 168: colon_idx = sp.find_first_of(':');
Should this also check for the case of colon_idx == npos and skip ?


http://gerrit.cloudera.org:8080/#/c/12954/2/be/src/util/system-state-info.cc@170
PS2, Line 170:
I wonder if we should just skip if counters.values() < NUM_NET_VALUES ? Will 
doing so cause us to always skip for certain versions of old kernels ?



--
To view, visit http://gerrit.cloudera.org:8080/12954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic804955d8e4269e787037a6dc68bef2d70382426
Gerrit-Change-Number: 12954
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 08 Apr 2019 20:38:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7981: Add host disk usage to profile

2019-04-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12822 )

Change subject: IMPALA-7981: Add host disk usage to profile
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc
File be/src/util/system-state-info-test.cc:

http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc@105
PS2, Line 105: R"(   8   0 sda 17124835 222797 716029974 8414920 
43758807 38432095 7867287712 630179264 0 32547524 638999340
> line too long (117 > 90)
Do we also care about testing for random devices which show up in 
/proc/diskstats such as ram0 ? My /proc/diskstats has the followings:

   1   0 ram0 0 0 0 0 0 0 0 0 0 0 0
   1   1 ram1 0 0 0 0 0 0 0 0 0 0 0
   1   2 ram2 0 0 0 0 0 0 0 0 0 0 0
   1   3 ram3 0 0 0 0 0 0 0 0 0 0 0
   1   4 ram4 0 0 0 0 0 0 0 0 0 0 0
   1   5 ram5 0 0 0 0 0 0 0 0 0 0 0
   1   6 ram6 0 0 0 0 0 0 0 0 0 0 0
   1   7 ram7 0 0 0 0 0 0 0 0 0 0 0
   1   8 ram8 0 0 0 0 0 0 0 0 0 0 0
   1   9 ram9 0 0 0 0 0 0 0 0 0 0 0
   1  10 ram10 0 0 0 0 0 0 0 0 0 0 0
   1  11 ram11 0 0 0 0 0 0 0 0 0 0 0
   1  12 ram12 0 0 0 0 0 0 0 0 0 0 0
   1  13 ram13 0 0 0 0 0 0 0 0 0 0 0
   1  14 ram14 0 0 0 0 0 0 0 0 0 0 0
   1  15 ram15 0 0 0 0 0 0 0 0 0 0 0
   7   0 loop0 0 0 0 0 0 0 0 0 0 0 0
   7   1 loop1 0 0 0 0 0 0 0 0 0 0 0
   7   2 loop2 0 0 0 0 0 0 0 0 0 0 0
   7   3 loop3 0 0 0 0 0 0 0 0 0 0 0
   7   4 loop4 0 0 0 0 0 0 0 0 0 0 0
   7   5 loop5 0 0 0 0 0 0 0 0 0 0 0
   7   6 loop6 0 0 0 0 0 0 0 0 0 0 0
   7   7 loop7 0 0 0 0 0 0 0 0 0 0 0


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info-test.cc@177
PS2, Line 177: // Tests the computation logic for disk usage. This test also 
makes sure that values of a
 : // partition are not accounted towards the disk it is on.
Is this comment meant for the earlier test at line 103 ?


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h
File be/src/util/system-state-info.h:

http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@183
PS2, Line 183:   /// The enum names correspond to the fields of /proc/diskstats
 :   /// https://www.kernel.org/doc/Documentation/iostats.txt
The followings are the metrics common to both 2.6 and 4.8+, right ?


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@221
PS2, Line 221: Entries are
 :   /// set to 0 for invalid entries.
Invalid entries are set to 0.


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@228
PS2, Line 228: /// Compute the disk usage.
Computes the read and write rate for the most recent epoch.


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.h@233
PS2, Line 233: DiskUsage
nit: I kind of interpret it as disk space used on first read. DiskStats may be 
a slightly better name but then it may not be consistent with other metrics we 
are measuring.


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@45
PS2, Line 45:   memset(_usage_, 0, sizeof(network_usage_));
memset(_usage_, 0, sizeof(disk_usage_));


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@81
PS2, Line 81: LOG(WARNING) << "Could not open " << path << ": " << 
GetStrErrMsg() << endl;
Will this lead to spam in the log if we periodically try reading the same path 
and fail or is it the responsibility of the caller to throttle calling this 
function ?


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@199
PS2, Line 199:
DCHECK(disk_val_idx_ == 0 || disk_val_idx_ == 1);


http://gerrit.cloudera.org:8080/#/c/12822/2/be/src/util/system-state-info.cc@232
PS2, Line 232: ==
<= 0



--
To view, visit http://gerrit.cloudera.org:8080/12822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I373e7da47a0d722938e6ca1572c49a502951ed57
Gerrit-Change-Number: 12822
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sat, 06 Apr 2019 01:37:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-04-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:   // Call DoRpcWithRetry with no retry on our busy service.
 :   const int retries = 3;
 :   PingRequestPB request3;
 :   PingResponsePB response3;
 :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, 
::Ping,
 :   request3, , query_ctx, "ping failed", retries, 
100 * MILLIS_PER_SEC);
 :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
 :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
 :   "dropped due to backpressure. The service queue contains 1 
items out of a maximum "
 :   "of 1; memory consumption is ");
 :   ASSERT_EQ(overflow_metric->GetValue(), retries);
 :
 :   // Call DoRpcWithRetry, but this time we do retey on a busy 
service, and this waiting
 :   // allows the outstanding async calls to complete, so that 
then this call will succeed.
 :   PingRequestPB request4;
 :   PingResponsePB response4;
 :   impala::Status rpc_status_backoff =
 :   RpcMgr::DoRpcWithRetry(proxy, ::Ping, 
request4, ,
 :   query_ctx, "ping failed", 10, 100 * MILLIS_PER_SEC, 2 
* MILLIS_PER_SEC);
 :   ASSERT_OK(rpc_status_backoff);
 :   ASSERT_GT(overflow_metric->GetValue(
> I reviewed the test and tried to make to more bulletproof. I think any test
How about the following idea:
- the RPC handler can share some sort of condition variable or barrier with the 
main test. The test can then indirectly control when the RPC handler will 
complete by signaling those condition variables.

In this way, as long as the main thread doesn't signal the CV, the first RPC 
will never finish and the second RPC will also block forever. This makes the 
test not timing dependent and less prone to failure due to random timing issue. 
What do you think ?


http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@178
PS7, Line 178:   // A no-op method that is used as part of overloading 
DoRpcWithRetry().
 :   static void logging_noop() {}
 :
 :   // The type of the log method passed to DoRpcWithRetry().
 :   typedef boost::function RpcLogFn;
Not needed.


http://gerrit.cloudera.org:8080/#/c/12672/7/be/src/rpc/rpc-mgr.h@186
PS7, Line 186: timeout
timeout_ms


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   // A no-op method that is used as part of overloading 
DoRpcWithRetry().
 :   static void logging_noop() {}
 :
 :   // The type of the log method passed to DoRpcWithRetry().
 :   typedef boost::function RpcLogFn;
> Removed
They still seem to be in PS7


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201:
 :  private:
 :   /// One pool per registered service. scoped_refptr<> is 
dictated by the Kudu interface.
 :   std::vector> service_pools_;
 :
> OK I'll leave this alone and Thomas will clean up the mess? :-)
Sounds good to me.



--
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 02 Apr 2019 02:08:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-04-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator-backend-state.cc@174
PS7, Line 174:   const auto trigger = MakeScopeExitTrigger(
 :   [this]() { last_report_time_ms_ = 
GenerateReportTimestamp(); });
Does it implicitly rely on the LIFO order for local variables' destruction in 
C++ so "notifier" won't be destroyed before "trigger" ? Otherwise, we could 
have notified 'exec_complete_barrier' without setting 'last_report_time_ms'.

I may be paranoid but can you please add a comment about this so future changes 
won't accidentally swap their orders ? I suppose we can skip the use of 
NotifyBarrierOnExit here and just notify 'exec_complete_barrier' in 'trigger' 
after setting 'last_report_tims_ms_'.


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@549
PS7, Line 549: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/runtime/coordinator.cc@664
PS7, Line 664: DCHECK(
DCHECK_LE


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@398
PS4, Line 398:
> k. We could deprecate it in a separate patch I suppose. Fine with this as i
As Thomas said, this config option was useful when users faced instability with 
large Impala cluster due to IMPALA-2990 (which this patch is fixing). We can 
consider deprecating this flag once we are convinced that this patch makes 
enough improvement that the instability with large cluster is mostly mitigated.

The periodic reporting matters for long running queries. The default reporting 
interval of 5 seconds could be longer than most of the short running query's 
duration. For long running queries, it probably doesn't really matter to get a 
report every 5 seconds vs say 30s or 60s. If we ever deprecate this flag, 
please consider bumping the the default reporting interval to something larger.


http://gerrit.cloudera.org:8080/#/c/12299/4/be/src/service/impala-server.cc@2288
PS4, Line 2288:   return Status::OK();
  : }
  :
  : void ImpalaServer::ExpireQuery(ClientRequestState* crs, const 
Status& status) {
  :   DCHECK(!status.ok());
  :   cancellation_thread_pool_->Offer(
> k, would be interested what Michael thinks about that idea
The executor's reporting path currently can tolerate up to 
FLAGS_status_report_max_retries number of failures before giving up and 
declaring the coordinator as inaccessible. It's intentionally not a time-based 
approach and instead relies on the periodic reporting to handle the retry.

Stepping back a bit, my understanding for why we want to approximate the 
maximum retry time is that the coordinator may not want to prematurely cancel a 
query due to a missing report from an executor if it knows that the executor 
*may* still be retrying to send a report. However, as pointed out elsewhere by 
Todd, if an Impala can experience random pauses or random scheduling delay for 
the reporting thread, no amount of wait time will guarantee that an executor is 
not in the middle of retrying to send a report when the query is cancelled. I 
suppose that's not an end-goal we try to achieve.

This prompts me to think whether it's any different if we just do the following:
- expose a coordinator's maximum tolerable lag time as a knob set to arbitrary 
number
- expose an executor's maximum tolerable retry time as a knob set to some 
arbitrary number
- deprecate FLAGS_status_report_max_retries

An alternative would be to:
- expose an executor's maximum tolerable retry time as a knob set to some 
arbitrary number
- deprecate FLAGS_status_report_max_retries
- set coordinator's maximum tolerable lag time as (100 + x)% of the maximum 
tolerable retry time where x *may be* tunable in case the default value causes 
any trouble.

Todd / Thomas, what do you think ?


http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/7/be/src/service/impala-server.cc@218
PS7, Line 218: DEFINE_int32(status_report_max_retries, 3,
> Is our timeout too aggressive here? If one of the backends goes into some k
I agree that the coordinator should be more lenient in handling lost reports to 
avoid false positive. On the other hand, no amount of timeout can save us from 
random pauses in Impalad so we should work on 

[Impala-ASF-CR] IMPALA-7800: Time out new connections after --fe service threads

2019-03-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Time out new connections after --fe_service_threads
..


Patch Set 13: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12579/13/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/13/be/src/rpc/TAcceptQueueServer.cpp@192
PS13, Line 192:   int wait_result = 0;
nit: This can be in the while loop.



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 13
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 28 Mar 2019 23:23:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8143: Enhance DoRpcWithRetry().

2019-03-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
..


Patch Set 2:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h@296
PS5, Line 296: ERR
typo


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@214
PS5, Line 214:   ASSERT_OK(rpc_mgr_.RegisterService(10, 10, scan_mem_impl,
 :   static_cast<
Not sure if it's output of clang-tidy or something but I find it a bit hard to 
ready to split it like this. Can't we keep these 2 lines unchanged ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@263
PS5, Line 263: ueue f
typo


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@291
PS5, Line 291:
nullptr;

Also do you want to assert that it's zero ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@298
PS5, Line 298: // Configure the service
not used ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@316
PS5, Line 316: ASSERT_OK(ping_
Instead of sharing "sleep_ms" between RPCs, it may be cleaner to pass the sleep 
time as RPC argument instead.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@322
PS5, Line 322: // A lambda t
proxy->


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:
 :   // wait for async call to be running
 :   sleep_started.Wait();
 :   ASSERT_EQ(0, get_current_queue_size(rpc_mgr_));
 :
 :   // Do a second async call that will fill up the queue
 :   RpcController async_controller2;
 :   CountingBarrier async2_done(1);
 :   // Reset the sleep time
 :   sleep_ms = 100;
 :   // Set the deadline to something reasonable otherwise the 
pings from DoRpcWithRetry
 :   // will replace this async call in the queue.
 :   MonoTime deadline = MonoTime::Now() + 
MonoDelta::FromSeconds(10);
 :   async_controller2.set_deadline(deadline);
 :   ResponseCallback async2_callback = [&]() { 
async2_done.Notify(); };
 :   proxy.get()->PingAsync(request, , _controller2, 
async2_callback);
 :
 :   // Wait for second async to get queued
 :   SleepForMs(300);
 :   // Check the queue size
 :   ASSERT_EQ(1, get_current_queue_size(
These set of tests seem rather timing dependent. For instance, it may occur 
that the first time DoRpcWithRetry() is called will succeed because the async 
call somehow finishes by then due to some weird scheduling order. May be better 
to rethink a better way to test it.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@378
PS5, Line 378:   ASSERT_FALSE(async2_done.pending());
 :
 :   // Test injection of failures with DebugAction.
 :   
query_ctx.client_request.query_options.__set_debug_action("DoRpcWithRetry:FAIL");
 :   Status inject_status = RpcMgr::DoRpcWithRetry(ping_rpc, 
query_ctx, "ping failed", 1,
 :   MonoDelta::FromSeconds(100), 0, "DoRpcWithRetry");
 :   ASSERT_FALSE(inject_status.ok());
 :   EXPECT_ERROR(inject_status, TErrorCo
Please see comments in rpc-mgr.inline.h that we probably should only retry if 
the remote server is busy. For all other errors, we probably should report it 
right away to callers. We can consider transparently handling other types of 
errors in the future.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   void ToJson(rapidjson::Document* document);
 :
 :   /// Retry the Rpc 'rpc_call' up to 'times_to_try' times.
 :   /// Each Rpc has a timeout of 'timeout'.
 :   /// If the service is busy then sleep 'se
Not needed as discussed in person.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@189
PS5, Line 189: the R
typename


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@191
PS5, Line 191: n the 'l
It's important to highlight that this only works iff rpc_call is idempotent in 
the comments above. Any non-idempotent rpc_call using this interface may have 
undesired consequence.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@193
PS5, Line 193: ebugAction() to potentially inject errors.
 :   template 
It seems inconsistent to pass the timeout in as MonoDelta while passing the 
backoff time 

[Impala-ASF-CR] IMPALA-7800: Time out new connections after --fe service threads

2019-03-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Time out new connections after --fe_service_threads
..


Patch Set 11:

(2 comments)

Looking good. Please address the last two items and I can +2 it.

http://gerrit.cloudera.org:8080/#/c/12579/11/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/11/be/src/rpc/TAcceptQueueServer.cpp@130
PS11, Line 130: timeout
nit: timeout_ms


http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py
File tests/custom_cluster/test_frontend_connection_limit.py:

http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90
PS10, Line 90:  except Exception as e:
> The test will be marked as failed in that case (XFAIL). Added an else block
My understanding is that xfail will just mark it as "unexpected pass" without 
flagging any other problem if the test happens to pass.

Would it be more robust to set a flag in the exception handling block and 
assert that the flag is set at the end of the test ? In this way, there is no 
need to mark the test as xfail.



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 11
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 27 Mar 2019 22:35:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads

2019-03-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Reject new connections after --fe_service_threads
..


Patch Set 10:

(9 comments)

LGTM. Mostly nits.

http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG@7
PS10, Line 7: Reject new connections after --fe_service_threads
Does this need to be updated too ?


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@65
PS10, Line 65: timeout
nit: timeout_ms


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@116
PS10, Line 116:   /// Amount of time after which a connection request will be 
timed out.
in milliseconds


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp@263
PS10, Line 263:   if (queue_timeout_ms_ != 0)
  : entry->expiration_time_ = MonotonicMillis() + 
queue_timeout_ms_;
May need to guard against negative value although it's user's fault in that 
case.

 if (queue_timeout_ms > 0) {

 }


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h
File be/src/rpc/thrift-server.h:

http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@149
PS10, Line 149: int64_t queue_timeout_ms = 0);
May make sense to document the meaning of this argument too.

nit: this line seems to fit into the previous line too.


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@197
PS10, Line 197:   /// Amount of time an accepted client connection will be kept 
in the accept
  :   /// queue before it is timed out. Used in TAcceptQueueServer.
May want to document that if it's 0, it means there is no timeout (similar to 
max_concurrent_connections_ above).


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@277
PS10, Line 277:   }
nit: missing blank line after this function.


http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc@239
PS10, Line 239: accepted_cnxn_timeout
Should this have a more specific name such as "accepted_client_cnxn_timeout" as 
we use Thrift server for various internal services too ?


http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py
File tests/custom_cluster/test_frontend_connection_limit.py:

http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90
PS10, Line 90:  except Exception as e:
How do we catch cases in which the test passes without raising the exception ?



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 10
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 26 Mar 2019 22:05:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add extra TRACE calls for cancellation code path

2019-03-21 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12832 )

Change subject: Add extra TRACE calls for cancellation code path
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@215
PS1, Line 215:  TRACE_COUNTER_SCOPE_LATENCY_US("find_receiver_us");
Doing it here instead of FindRecvr() means you can also measure the time spent 
waiting for the lock. So, does it make sense to also sample similar code path 
in CloseSender() ?


http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@378
PS1, Line 378: {
Do we care about sampling this scope too ?


http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/runtime/krpc-data-stream-mgr.cc@406
PS1, Line 406: {
Same question here.


http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12832/1/be/src/service/control-service.cc@24
PS1, Line 24: #include "kudu/util/trace.h"
nit: is this new #include needed ?



--
To view, visit http://gerrit.cloudera.org:8080/12832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifedd95098a011d7a3bb3ebb54bfb3cbb7e94fab4
Gerrit-Change-Number: 12832
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 22 Mar 2019 01:47:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8277: support ranges in CPU lists

2019-03-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12755 )

Change subject: IMPALA-8277: support ranges in CPU lists
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97311bfbcf70bea069e941b6e7f4f015fb781b3f
Gerrit-Change-Number: 12755
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 15 Mar 2019 00:06:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8299: Fix crash in GroupingAggregator on uninited hash table

2019-03-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12746 )

Change subject: IMPALA-8299: Fix crash in GroupingAggregator on uninited hash 
table
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381657c43f93eb4871b93d54532cb886198fd0b2
Gerrit-Change-Number: 12746
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 14 Mar 2019 17:47:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads

2019-03-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Reject new connections after --fe_service_threads
..


Patch Set 6:

(6 comments)

Looking good. Please see some more comments.

http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h@88
PS6, Line 88: void CleanupAndClose(const std::string& error,
Please add a brief comment about what this function does and what those 
parameters are.


http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@29
PS6, Line 29: DEFINE_int64(accepted_cnxn_timeout, 300,
: "(Advanced) The amount of time in seconds an accepted 
connection will wait in the "
: "queue before we time it out and reject the connection 
request. A value of 0 "
: "means there is no timeout.");
Sorry but now that I realize this may affect all existing Thrift based 
services, I would think it's safe to only enable this for external client 
facing Thrift services (e.g. HS2, Beeswax). For Catalog and statestore and 
Impala internal connections, it may be better to leave it as no timeout.

In other words, it may make sense to have this timeout as part of the 
constructor of TAcceptQueueServer and create flags only for HS2 and Beeswax 
related Thrift services.

Sorry for not noticing this earlier.


http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@195
PS6, Line 195: wait_time
nit: coding convention in Impala usually keeps the unit as the suffix of the 
variable name so this can be wait_time_ms;


http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@205
PS6, Line 205: msecs
nit: we either use ms or milliseconds.


http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@207
PS6, Line 207: (wait_result == THRIFT_ETIMEDOUT)
Do we care about any potential non-zero result value ? Should we bail out in 
those cases too ?


http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@265
PS6, Line 265:   if (FLAGS_accepted_cnxn_timeout != 0)
 : entry->expiration_time_ =
 : MonotonicMillis() + FLAGS_accepted_cnxn_timeout * 
MILLIS_PER_SEC;
nit styling:

  if (FLAGS_... != 0) {
 
  }



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 12 Mar 2019 01:50:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-03-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12257 )

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..


Patch Set 8: Code-Review+1

(2 comments)

Sorry again for the delay. LGTM. I added Bharath and Paul to the reviewer list 
so they can double check the logic in FE.

http://gerrit.cloudera.org:8080/#/c/12257/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12257/8//COMMIT_MSG@14
PS8, Line 14: until IMPALA-4224 is implemented. Inserts work without
Also, as the way it's implemented now, the memory consumption for broadcast 
join will scale with MT_DOP, right ?


http://gerrit.cloudera.org:8080/#/c/12257/8/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/12257/8/be/src/exec/partitioned-hash-join-builder.cc@108
PS8, Line 108: filter_ctxs_.back().filter = 
state->filter_bank()->RegisterFilter(filter_desc, true);
May want to add a reference to IMPALA-4400 which is abput aggregating the 
filter locally.



--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Mar 2019 22:44:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues

2019-03-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12631 )

Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12631
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5
Gerrit-Change-Number: 12631
Gerrit-PatchSet: 5
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Comment-Date: Wed, 06 Mar 2019 00:11:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3816: (prep) Move TupleSorter to sorter-ir.cc

2019-03-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10679 )

Change subject: IMPALA-3816: (prep) Move TupleSorter to sorter-ir.cc
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaaf2b75c2f789002c42939865c018f728d29a113
Gerrit-Change-Number: 10679
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 05 Mar 2019 22:55:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-03-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12257 )

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..


Patch Set 8:

Sorry, will get back to it this week.


--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 04 Mar 2019 22:46:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-03-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12624 )

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..


Patch Set 5: Code-Review+2

Rebase. Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 04 Mar 2019 22:42:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12651 )

Change subject: IMPALA-8274: Fix iteration of profiles in 
ApplyExecStatusReport()
..


Patch Set 2:

Waiting for https://gerrit.cloudera.org/#/c/12649/ to merge.


--
To view, visit http://gerrit.cloudera.org:8080/12651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Gerrit-Change-Number: 12651
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 04 Mar 2019 19:35:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12651 )

Change subject: IMPALA-8274: Fix iteration of profiles in 
ApplyExecStatusReport()
..


Patch Set 2: Code-Review+2

Added a line of comment. Carry Thomas' +2.


--
To view, visit http://gerrit.cloudera.org:8080/12651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Gerrit-Change-Number: 12651
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 04 Mar 2019 19:34:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12651

to look at the new patch set (#2).

Change subject: IMPALA-8274: Fix iteration of profiles in 
ApplyExecStatusReport()
..

IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

The coordinator skips over any stale or duplicated status
reports of fragment instances. In the previous implementation,
the index pointing into the vector of Thrift profiles wasn't
updated when skipping over a status report. This breaks the
assumption that the status reports and thrift profiles vectors
have one-to-one correspondence. Consequently, we may end up
passing the wrong profile to InstanceStats::Update(), leading
to random crashes.

This change fixes the problem above by using iterators to
iterate through the status reports and thrift profiles vectors
and ensures that both iterators are updated on every iteration
of the loop.

Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
---
M be/src/runtime/coordinator-backend-state.cc
M tests/custom_cluster/test_rpc_timeout.py
2 files changed, 33 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/12651/2
--
To view, visit http://gerrit.cloudera.org:8080/12651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Gerrit-Change-Number: 12651
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

2019-03-04 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12651


Change subject: IMPALA-8274: Fix iteration of profiles in 
ApplyExecStatusReport()
..

IMPALA-8274: Fix iteration of profiles in ApplyExecStatusReport()

The coordinator skips over any stale or duplicated status
reports of fragment instances. In the previous implementation,
the index pointing into the vector of Thrift profiles wasn't
updated when skipping over a status report. This breaks the
assumption that the status reports and thrift profiles vectors
have one-to-one correspondence. Consequently, we may end up
passing the wrong profile to InstanceStats::Update(), leading
to random crashes.

This change fixes the problem above by using iterators to
iterate through the status reports and thrift profiles vectors
and ensures that both iterators are updated on every iteration
of the loop.

Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
---
M be/src/runtime/coordinator-backend-state.cc
M tests/custom_cluster/test_rpc_timeout.py
2 files changed, 32 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/12651/1
--
To view, visit http://gerrit.cloudera.org:8080/12651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8bce426c7d08ffbf0f8cd26889262243a52cc752
Gerrit-Change-Number: 12651
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues

2019-03-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12631 )

Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues
..


Patch Set 4:

(2 comments)

LGTM. Please fix the other two usages of "host"

http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml@555
PS4, Line 555: host
Impalad


http://gerrit.cloudera.org:8080/#/c/12631/4/docs/topics/impala_known_issues.xml@555
PS4, Line 555: host
Impalad



--
To view, visit http://gerrit.cloudera.org:8080/12631
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5
Gerrit-Change-Number: 12631
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Comment-Date: Fri, 01 Mar 2019 23:32:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads

2019-03-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Reject new connections after --fe_service_threads
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@198
PS5, Line 198: if (entry->expiration_time_ != 0)
 :   wait_time = entry->expiration_time_ - 
MonotonicMillis();
styling nit:

  if (entry->expiration_time_ != 0) {
   wait_time = entry->expiration_time_ - MonotonicMillis();
  }


http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@200
PS5, Line 200: LOG(INFO) << "All " << maxTasks_ << " server threads are 
in use. "
 :   << "Waiting for " << wait_time << " msecs.";
Does it make sense to log only if we are timing out a request below ?


http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@263
PS5, Line 263:   MonotonicMillis() + FLAGS_accepted_cnxn_timeout * 
MILLIS_PER_SEC;
nit: indent 4


http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@267
PS5, Line 267: connection_setup_pool.Offer(
In theory, this could block too but it's still counted towards the 
expiration_time_. So, is it possible for the code to reach line 199 above and 
compute a negative wait_time value ?


http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py
File tests/custom_cluster/test_frontend_connection_limit.py:

http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py@1
PS5, Line 1:
nit: blank line



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 5
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 01 Mar 2019 23:30:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Add HADOOP-15720 to the list of Known Issues

2019-03-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12631 )

Change subject: [DOCS] Add HADOOP-15720 to the list of Known Issues
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12631/3/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/12631/3/docs/topics/impala_known_issues.xml@565
PS3, Line 565: host
Impalad



--
To view, visit http://gerrit.cloudera.org:8080/12631
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9497d337cb79c38add85bb0de9d07660617c76b5
Gerrit-Change-Number: 12631
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Comment-Date: Fri, 01 Mar 2019 18:54:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-03-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12624 )

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12624/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12624/3//COMMIT_MSG@18
PS3, Line 18: test_rpc_timeout.py
> Is that file missing from the change then?
There is no change in that test file. A pre-existing test already exercises 
that path.



--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Fri, 01 Mar 2019 18:59:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12624 )

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@119
PS1, Line 119:  "The service queue is full; it has $3 items; 
memory "
> If we want to keep the single message then maybe a message like
Done



--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Fri, 01 Mar 2019 00:41:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-28 Thread Michael Ho (Code Review)
Hello Andrew Sherman, Thomas Marshall, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12624

to look at the new patch set (#3).

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..

IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

An incoming request to a RPC service can be rejected due to either
exceeding the memory limit or maximum allowed queue length.
It's unclear from the current error message which of those factors
contributes to the failure as neither the actual queue length nor
the memory consumption is printed.

This patch fixes the problem by printing the estimated queue length
and memory consumption when a RPC request is dropped.

Testing done: verified the new error message with test_rpc_timeout.py

Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
2 files changed, 8 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/3
--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8264: Fix overflow in SystemStateInfo::ComputeCpuRatios

2019-02-28 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12638 )

Change subject: IMPALA-8264: Fix overflow in SystemStateInfo::ComputeCpuRatios
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12638/1/be/src/util/system-state-info-test.cc
File be/src/util/system-state-info-test.cc:

http://gerrit.cloudera.org:8080/#/c/12638/1/be/src/util/system-state-info-test.cc@63
PS1, Line 63: CPUY
typo ?



--
To view, visit http://gerrit.cloudera.org:8080/12638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46dfb7264e85a3da492e383caaa6853d8ef95f35
Gerrit-Change-Number: 12638
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 28 Feb 2019 23:40:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12579 )

Change subject: IMPALA-7800: Reject new connections after --fe_service_threads
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@28
PS3, Line 28: 120,
Will it be safer to have a higher default timeout (e.g. 300 seconds) to avoid 
users' complaint given the current behavior is that we don't have a timeout ?

Please also add a remark that setting it to 0 means there is no timeout.


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@151
PS3, Line 151: const string& error)
nit: we tend to put the constant parameters followed by non-constant parameters.


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@193
PS3, Line 193: 0LL
nit: 'LL' not needed.


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@196
PS3, Line 196: if (entry->expiration_time_)
nit: if (entry->expiration_time_ != 0) {


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@197
PS3, Line 197: wait_time = entry->expiration_time_ - MonotonicMillis();
Shouldn't this be inside the while loop below ?


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@199
PS3, Line 199: LOG(INFO) << "All " << maxTasks_ << " server threads are 
in use. "
 :   << "Waiting for " << wait_time << " msecs.";
Will this lead to log spam ?


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@204
PS3, Line 204: Timing out connection request.";
Any chance we can print some details (e.g. IP address, port) here ?


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@239
PS3, Line 239: shared_ptr
This can be "unique_ptr" now that TTransport lives inside TAcceptQueueEntry. 
This makes ownership easier to understand.


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@260
PS3, Line 260: if (FLAGS_accepted_cnxn_timeout)
nit: if (FLAGS_accepted_cnxn_timeout != 0) {


http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@261
PS3, Line 261: MILLIS_PER_SEC;
long line.



--
To view, visit http://gerrit.cloudera.org:8080/12579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64
Gerrit-Change-Number: 12579
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 27 Feb 2019 22:26:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12624 )

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc
File be/src/rpc/impala-service-pool.cc:

http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@119
PS1, Line 119:  "The service queue is full; it has $3 items; 
memory "
> Isn't the message "The service queue is full" misleading?
It's full when either the max queue length or the memory consumption limit is 
reached. Definitely open to better suggestion for the message. I considered 
"The service queue has reached its capacity limit" but that doesn't seem any 
different.


http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@120
PS1, Line 120:  "consumption is $4",
> Nit: needs terminating "."
Done


http://gerrit.cloudera.org:8080/#/c/12624/1/be/src/rpc/impala-service-pool.cc@125
PS1, Line 125:  
PrettyPrinter::Print(service_mem_tracker_->consumption(), TUnit::BYTES));
> There's a similar message in service-pool.cc
As you mentioned, that's the Kudu's implementation. We forked it to allow us to 
enforce memory limit too. So, in the Kudu's implementation, the queue is only 
full because it has reached the maximum queue length.



--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Wed, 27 Feb 2019 22:08:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-27 Thread Michael Ho (Code Review)
Hello Andrew Sherman, Thomas Marshall, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12624

to look at the new patch set (#2).

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..

IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

An incoming request to a RPC service can be rejected due to either
exceeding the memory limit or maximum allowed queue length.
It's unclear from the current error message which of those factors
contributes to the failure as neither the actual queue length nor
the memory consumption is printed.

This patch fixes the problem by printing the estimated queue length
and memory consumption when a RPC request is dropped.

Testing done: verified the new error message with test_rpc_timeout.py

Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
2 files changed, 7 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/2
--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12624


Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..

IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

An incoming request to a RPC service can be rejected due to either
exceeding the memory limit or maximum allowed queue length.
It's unclear from the current error message which of those factors
contributes to the failure as neither the actual queue length nor
the memory consumption is printed.

This patch fixes the problem by printing the estimated queue length
and memory consumption when a RPC request is dropped.

Testing done: verified the new error message with test_rpc_timeout.py

Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
2 files changed, 7 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/12624/1
--
To view, visit http://gerrit.cloudera.org:8080/12624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If0297658acf2b23823dcb7d2bdff5d8e4475bb98
Gerrit-Change-Number: 12624
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12623


Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..

IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

An incoming request to a RPC service can be rejected either due
to exceeding the memory limit or maximum allowed queue length.
It's unclear from the current error message which of those factors
contributes to the failure as neither the actual queue length nor
the memory consumption is printed.

This patch fixes the problem by printing the estimated queue length
and memory consumption when a RPC request is dropped.

Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc
---
M be/src/rpc/impala-service-pool.cc
M be/src/rpc/impala-service-pool.h
2 files changed, 7 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/12623/1
--
To view, visit http://gerrit.cloudera.org:8080/12623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc
Gerrit-Change-Number: 12623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8256: Better error message for ImpalaServicePool::RejectTooBusy()

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12623 )

Change subject: IMPALA-8256: Better error message for 
ImpalaServicePool::RejectTooBusy()
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/12623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I5b40b8e2077b61e2f7e244b87c946f30803438bc
Gerrit-Change-Number: 12623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-4568: Cache parquet footer

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12622 )

Change subject: IMPALA-4568: Cache parquet footer
..


Patch Set 1:

Sorry, pushing the wrong branch. This is WIP. Definitely not meant for review.


--
To view, visit http://gerrit.cloudera.org:8080/12622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78
Gerrit-Change-Number: 12622
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Feb 2019 20:28:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4568: Cache parquet footer

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has abandoned this change. ( http://gerrit.cloudera.org:8080/12622 )

Change subject: IMPALA-4568: Cache parquet footer
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/12622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78
Gerrit-Change-Number: 12622
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-4568: Cache parquet footer

2019-02-27 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12622


Change subject: IMPALA-4568: Cache parquet footer
..

IMPALA-4568: Cache parquet footer

Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-readers.cc
A be/src/exec/parquet/parquet-footer-cache.cc
A be/src/exec/parquet/parquet-footer-cache.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
15 files changed, 443 insertions(+), 126 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/12622/1
--
To view, visit http://gerrit.cloudera.org:8080/12622
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib30dcc4a13050c11f011491dec3776ba9e402c78
Gerrit-Change-Number: 12622
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8251: Run test exchange deferred batches in dev builds only

2019-02-26 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12610


Change subject: IMPALA-8251: Run test_exchange_deferred_batches in dev builds 
only
..

IMPALA-8251: Run test_exchange_deferred_batches in dev builds only

One of the stress related flag is not available in the development build.

Testing done: Ran the test with release build and verified that it's skipped.

Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5
---
M tests/custom_cluster/test_exchange_deferred_batches.py
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/12610/1
--
To view, visit http://gerrit.cloudera.org:8080/12610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5
Gerrit-Change-Number: 12610
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8251: Run test exchange deferred batches in dev builds only

2019-02-26 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12610

to look at the new patch set (#2).

Change subject: IMPALA-8251: Run test_exchange_deferred_batches in dev builds 
only
..

IMPALA-8251: Run test_exchange_deferred_batches in dev builds only

The startup flag stress_datastream_recvr_delay_ms is not available in
development builds. Skip the test in non-developement builds.

Testing done: Ran the test with release build and verified that it's skipped.

Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5
---
M tests/custom_cluster/test_exchange_deferred_batches.py
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/12610/2
--
To view, visit http://gerrit.cloudera.org:8080/12610
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5caaa6fa39d6c97f313b675838c27740af9aa1d5
Gerrit-Change-Number: 12610
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/12299/2/be/src/runtime/coordinator-backend-state.h@137
PS2, Line 137: boost::lock_guard l(lock_);
> Is that something we can rely on in all possible environments in which Impa
Yes, I suppose it may not hold true for all architecture. It generally holds 
for x86-64 architecture. That said, Impala is quite x86 centric although there 
were works in the past to port it to Power architecture.

Anyhow, this is not important unless we see any contention during perf testing.



--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 26 Feb 2019 00:57:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12551 )

Change subject: [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc
..


Patch Set 1:

Sorry for the delay. +2'ed


--
To view, visit http://gerrit.cloudera.org:8080/12551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ecae87938cc2ee9ce5a2bed639d3719f9bd793e
Gerrit-Change-Number: 12551
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 26 Feb 2019 00:54:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12551 )

Change subject: [DOCS] Noted IMPALA-8154 as fixed in Known Issues doc
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ecae87938cc2ee9ce5a2bed639d3719f9bd793e
Gerrit-Change-Number: 12551
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 26 Feb 2019 00:54:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8138: Reintroduce rpc debugging options

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12297 )

Change subject: IMPALA-8138: Reintroduce rpc debugging options
..


Patch Set 1:

Any update ?


--
To view, visit http://gerrit.cloudera.org:8080/12297
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941
Gerrit-Change-Number: 12297
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 25 Feb 2019 23:27:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 2:

Any update ?


--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 25 Feb 2019 23:26:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12567 )

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12567/2/tests/custom_cluster/test_exchange_deferred_batches.py
File tests/custom_cluster/test_exchange_deferred_batches.py:

http://gerrit.cloudera.org:8080/#/c/12567/2/tests/custom_cluster/test_exchange_deferred_batches.py@21
PS2, Line 21:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py
File tests/custom_cluster/test_exchange_deferred_batches.py:

http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py@21
PS3, Line 21:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12567/3/tests/custom_cluster/test_exchange_deferred_batches.py@24
PS3, Line 24: @
> flake8: E303 too many blank lines (2)
Done



--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Feb 2019 23:00:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12567

to look at the new patch set (#3).

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..

IMPALA-8239: Fix handling of failed deserialization of row batch

Previously, when a row batch failed to be deserialized in the
data stream receiver, we will return the error status to the
sender of the row batch without inserting the row batch. The
receiver will continue to operate without flagging any error.
The assumption is that the sender will eventually cancel the
query upon receiving the failed status.

Normally, when a caller of GetBatch() successfully dequeues a row
batch from the batch queue, it will kick off the draining of the
row batches from the deferred queue into the normal batch queue,
which will further continue the cycle of draining the deferred queue
upon the next call to GetBatch() until the deferred queue becomes empty.

When an error is hit when deserializing a deferred batch to be inserted
into the batch queue, the existing code will simply not insert the row
batch or flag any error. This breaks the cycle of the deferred queue
draining as the batch queue may become empty forever. The caller of
GetBatch() will block indefinitely until the query is cancelled.

The existing code works fine as the expectation is that the query will
be cancelled once the sender receives the error status from the RPC
response. However, this behavior is still not ideal as it lets a query
which has hit a fatal error to hold on to resources for extended period
of time.

This patch fixes the problem by explicitly recording any error during
row batch insertion in an error status object. Callers of GetBatch()
will now also poll for this status object while waiting for row batch
to show up and bail out early if there is any error.

A new test case has been added to simulate the problematic case above.

Change-Id: Iaa74937b046d95484887533be548249e96078755
---
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
A tests/custom_cluster/test_exchange_deferred_batches.py
7 files changed, 157 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/3
--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12567

to look at the new patch set (#4).

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..

IMPALA-8239: Fix handling of failed deserialization of row batch

Previously, when a row batch failed to be deserialized in the
data stream receiver, we will return the error status to the
sender of the row batch without inserting the row batch. The
receiver will continue to operate without flagging any error.
The assumption is that the sender will eventually cancel the
query upon receiving the failed status.

Normally, when a caller of GetBatch() successfully dequeues a row
batch from the batch queue, it will kick off the draining of the
row batches from the deferred queue into the normal batch queue,
which will further continue the cycle of draining the deferred queue
upon the next call to GetBatch() until the deferred queue becomes empty.

When an error is hit when deserializing a deferred batch to be inserted
into the batch queue, the existing code will simply not insert the row
batch or flag any error. This breaks the cycle of the deferred queue
draining as the batch queue may become empty forever. The caller of
GetBatch() will block indefinitely until the query is cancelled.

The existing code works fine as the expectation is that the query will
be cancelled once the sender receives the error status from the RPC
response. However, this behavior is still not ideal as it lets a query
which has hit a fatal error to hold on to resources for extended period
of time.

This patch fixes the problem by explicitly recording any error during
row batch insertion in an error status object. Callers of GetBatch()
will now also poll for this status object while waiting for row batch
to show up and bail out early if there is any error.

A new test case has been added to simulate the problematic case above.

Change-Id: Iaa74937b046d95484887533be548249e96078755
---
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
A tests/custom_cluster/test_exchange_deferred_batches.py
7 files changed, 157 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/4
--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12567 )

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..


Patch Set 2: Code-Review+1

Carry Lars' +1


--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Feb 2019 22:22:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12567 )

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@118
PS1, Line 118:   // accounting for various counters.
> Maybe document that 'lock_' must be held via 'lock' here and below. Ok to i
Done


http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@161
PS1, Line 161: inesrting
> inserting.
Done


http://gerrit.cloudera.org:8080/#/c/12567/1/tests/custom_cluster/test_exchange_deferred_batches.py
File tests/custom_cluster/test_exchange_deferred_batches.py:

http://gerrit.cloudera.org:8080/#/c/12567/1/tests/custom_cluster/test_exchange_deferred_batches.py@21
PS1, Line 21: class TestExchangeDeferredBatches(CustomClusterTestSuite):
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Feb 2019 22:22:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12567

to look at the new patch set (#2).

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..

IMPALA-8239: Fix handling of failed deserialization of row batch

Previously, when a row batch failed to be deserialized in the
data stream receiver, we will return the error status to the
sender of the row batch without inserting the row batch. The
receiver will continue to operate without flagging any error.
The assumption is that the sender will eventually cancel the
query upon receiving the failed status.

Normally, when a caller of GetBatch() successfully dequeues a row
batch from the batch queue, it will kick off the draining of the
row batches from the deferred queue into the normal batch queue,
which will further continue the cycle of draining the deferred queue
upon the next call to GetBatch() until the deferred queue becomes empty.

When an error is hit when deserializing a deferred batch to be inserted
into the batch queue, the existing code will simply not insert the row
batch or flag any error. This breaks the cycle of the deferred queue
draining as the batch queue may become empty forever. The caller of
GetBatch() will block indefinitely until the query is cancelled.

The existing code works fine as the expectation is that the query will
be cancelled once the sender receives the error status from the RPC
response. However, this behavior is still not ideal as it lets a query
which has hit a fatal error to hold on to resources for extended period
of time.

This patch fixes the problem by explicitly recording any error during
row batch insertion in an error status object. Callers of GetBatch()
will now also poll for this status object while waiting for row batch
to show up and bail out early if there is any error.

A new test case has been added to simulate the problematic case above.

Change-Id: Iaa74937b046d95484887533be548249e96078755
---
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
A tests/custom_cluster/test_exchange_deferred_batches.py
7 files changed, 156 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/2
--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12567 )

Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/12567/1/be/src/runtime/krpc-data-stream-recvr.cc@346
PS1, Line 346:   status_.MergeStatus(status);
> Should we signal data_arrival_cv_ here? Since the loop above depends on thi
Good point. I think that's a more general implementation than what's 
implemented in this patch.



--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Feb 2019 19:45:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8239: Fix handling of failed deserialization of row batch

2019-02-23 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12567


Change subject: IMPALA-8239: Fix handling of failed deserialization of row batch
..

IMPALA-8239: Fix handling of failed deserialization of row batch

Previously, when a row batch failed to be deserialized in the
data stream receiver, we will return the error status to the
sender of the row batch without inserting the row batch. The
receiver will continue to operate without flagging any error.
The assumption is that the sender will eventually cancel the
query upon receiving the failed status.

Normally, when a caller of GetBatch() successfully dequeues a row
batch from the batch queue, it will kick off the draining of the
row batches from the deferred queue into the normal batch queue,
which will further continue the cycle of draining the deferred queue
upon the next call to GetBatch() until the deferred queue becomes empty.

When an error is hit when deserializing a deferred batch to be inserted
into the batch queue, the existing code will simply not insert the row
batch or flag any error. This breaks the cycle of the deferred queue
draining as the batch queue may become empty forever. The caller of
GetBatch() will block indefinitely until the query is cancelled.

The existing code works fine as the expectation is that the query will
be cancelled once the sender receives the error status from the RPC
response. However, this behavior is still not ideal as it lets a query
which has hit a fatal error to hold on to resources for extended period
of time.

This patch fixes the problem by explicitly recording any error during
row batch insertion in an error status object. Callers of GetBatch()
will now also poll for this status object while waiting for row batch
to show up and bail out early if there is any error.

A new test case has been added to simulate the problematic case above.

Change-Id: Iaa74937b046d95484887533be548249e96078755
---
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
A tests/custom_cluster/test_exchange_deferred_batches.py
7 files changed, 139 insertions(+), 41 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/12567/1
--
To view, visit http://gerrit.cloudera.org:8080/12567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaa74937b046d95484887533be548249e96078755
Gerrit-Change-Number: 12567
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8212 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()

2019-02-21 Thread Michael Ho (Code Review)
Michael Ho has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/12553 )

Change subject: IMPALA-8212 / KUDU-2706: Work around the lack of thread safety 
in krb5_parse_name()
..


Removed reviewer Kudu Jenkins.
--
To view, visit http://gerrit.cloudera.org:8080/12553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe
Gerrit-Change-Number: 12553
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()

2019-02-21 Thread Michael Ho (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/12553

to review the following change.


Change subject: IMPALA-8239 / KUDU-2706: Work around the lack of thread safety 
in krb5_parse_name()
..

IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in 
krb5_parse_name()

krb5_init_context() sets the field 'default_realm' in a krb5_context
object to 0. Upon first call to krb5_parse_name() with a principal
without realm specified (e.g. foo/bar), 'default_realm' in the
krb5_context object is lazily initialized.

When more than one negotiation threads are configured, it's possible
for multiple threads to call CanonicalizeKrb5Principal() in parallel.
CanonicalizeKrb5Principal() in turn calls krb5_parse_name(g_krb5_ctx, ...)
with no lock held. In addition, krb5_parse_name() is not thread safe as
it lazily initializes 'context->default_realm' without holding lock.
Consequently, 'g_krb5_ctx' which is shared and not supposed to be modified
after initialization may be inadvertently modified concurrently by multiple
threads, leading to crashes (e.g. double free) or errors.

This change works around the problem by initializing 'g_krb5_ctx->default_realm'
once in InitKrb5Ctx() by calling krb5_get_default_realm().

Impala internally calls InitAuth() which calls InitKerberosForServer()
which ends up calling InitKrb5Ctx() in a single threaded environment.

Change-Id: I1bf9224516e2996f51f319088179727f76741ebe
Reviewed-on: http://gerrit.cloudera.org:8080/12545
Reviewed-by: Alexey Serbin 
Tested-by: Kudu Jenkins
---
M be/src/kudu/security/init.cc
1 file changed, 12 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/12553/1
--
To view, visit http://gerrit.cloudera.org:8080/12553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe
Gerrit-Change-Number: 12553
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)


[Impala-ASF-CR] IMPALA-8239 / KUDU-2706: Work around the lack of thread safety in krb5 parse name()

2019-02-21 Thread Michael Ho (Code Review)
Michael Ho has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/12553 )

Change subject: IMPALA-8239 / KUDU-2706: Work around the lack of thread safety 
in krb5_parse_name()
..


Removed reviewer Kudu Jenkins.
--
To view, visit http://gerrit.cloudera.org:8080/12553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I1bf9224516e2996f51f319088179727f76741ebe
Gerrit-Change-Number: 12553
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alexey Serbin 


[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile

2019-02-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12544 )

Change subject: IMPALA-8235: avoid TIME_MS in runtime profile
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc@1108
PS1, Line 1108: static_cast(
> Here's a minimal example of the problem: https://godbolt.org/z/cfFzvs
I see. Thanks for the explanation.



--
To view, visit http://gerrit.cloudera.org:8080/12544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53
Gerrit-Change-Number: 12544
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Feb 2019 02:33:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile

2019-02-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12544 )

Change subject: IMPALA-8235: avoid TIME_MS in runtime profile
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53
Gerrit-Change-Number: 12544
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Feb 2019 02:33:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8235: avoid TIME MS in runtime profile

2019-02-20 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12544 )

Change subject: IMPALA-8235: avoid TIME_MS in runtime profile
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG@10
PS1, Line 10: areare
are


http://gerrit.cloudera.org:8080/#/c/12544/1//COMMIT_MSG@13
PS1, Line 13: Work around the issue by using the older TIME_NS type for this
: counter.
What's the plan going forward ? Do we need to identify a set of interfaces 
which need to be backward compatible ?


http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/12544/1/be/src/scheduling/admission-controller.cc@1108
PS1, Line 1108: static_cast(
Why is this cast necessary ? Isn't time_since_update_ms already int64_t ?



--
To view, visit http://gerrit.cloudera.org:8080/12544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia6a0846f401ced1947187b8581563de9dbb4fe53
Gerrit-Change-Number: 12544
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 21 Feb 2019 01:58:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default

2019-02-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12451 )

Change subject: IMPALA-8187: UDF samples hide symbols by default
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28
Gerrit-Change-Number: 12451
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 13 Feb 2019 18:03:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12451 )

Change subject: IMPALA-8187: UDF samples hide symbols by default
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt
File be/src/udf_samples/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt@27
PS3, Line 27: set(IR_COMPILE_FLAGS "-emit-llvm" "-O3" "-std=c++14" "-c" "-I../" 
${CLANG_BASE_FLAGS})
> Do we care about this case as well ?
Actually, since it's emitting IR, it's irrelevant.



--
To view, visit http://gerrit.cloudera.org:8080/12451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28
Gerrit-Change-Number: 12451
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:23:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12451 )

Change subject: IMPALA-8187: UDF samples hide symbols by default
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28
Gerrit-Change-Number: 12451
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12451 )

Change subject: IMPALA-8187: UDF samples hide symbols by default
..


Patch Set 3:

Zoram, can you please also take a look ?


--
To view, visit http://gerrit.cloudera.org:8080/12451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28
Gerrit-Change-Number: 12451
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8187: UDF samples hide symbols by default

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12451 )

Change subject: IMPALA-8187: UDF samples hide symbols by default
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/testutil/test-udas.cc
File be/src/testutil/test-udas.cc:

http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/testutil/test-udas.cc@343
PS3, Line 343: }
nit: Please keep a blank line between functions


http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt
File be/src/udf_samples/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12451/3/be/src/udf_samples/CMakeLists.txt@27
PS3, Line 27: set(IR_COMPILE_FLAGS "-emit-llvm" "-O3" "-std=c++14" "-c" "-I../" 
${CLANG_BASE_FLAGS})
Do we care about this case as well ?



--
To view, visit http://gerrit.cloudera.org:8080/12451
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie17b74b9bce9cc6962393017879b65409ce23b28
Gerrit-Change-Number: 12451
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:10:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
..


Patch Set 1: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@12
PS1, Line 12: 'report_status_retry_interval_ms'
Please also mention that this flag is removed after IMPALA-4555.


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@19
PS1, Line 19: Now, we wait 'status_report_interval_ms'. By default, this is 
5000ms,
between retries



--
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:15:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1

2019-02-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12420 )

Change subject: Add support for compiling using OpenSSL 1.1
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76
PS1, Line 76: #else
:   // OpenSSL 1.1+ doesn't let us detect the supported TLS version 
at runtime. Assume
:   // that the OpenSSL library we're linked against supports only 
up to TLS1.2
:
> I guess OpenSSL is expecting that callers will pass in the version they wan
Fair enough. Agree that if we compile against OpenSSL 1.1, we would expect the 
OpenSSL version which can be linked against Impala at runtime should support 
TLS 1.2.



--
To view, visit http://gerrit.cloudera.org:8080/12420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264
Gerrit-Change-Number: 12420
Gerrit-PatchSet: 3
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: hector.aco...@cloudera.com 
Gerrit-Comment-Date: Tue, 12 Feb 2019 18:33:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1

2019-02-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12420 )

Change subject: Add support for compiling using OpenSSL 1.1
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76
PS1, Line 76: #else
:   return TLS1_2_VERSION;
: #endif
: }
> I'm not sure I follow.
Sorry, I wasn't clear enough. Why cannot you do the following ?

#if OPENSSL_VERSION_NUMBER < 0x1010L
return SSLv23_method()->version;
#else
return TLS_method()->version;
#endif



--
To view, visit http://gerrit.cloudera.org:8080/12420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264
Gerrit-Change-Number: 12420
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: hector.aco...@cloudera.com 
Gerrit-Comment-Date: Tue, 12 Feb 2019 01:30:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1

2019-02-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12420 )

Change subject: Add support for compiling using OpenSSL 1.1
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76
PS1, Line 76: #else
:   return TLS1_2_VERSION;
: #endif
: }
> SSLv23_method() still works fine, but the returned object is now opaque.
May be replacing it with TLS_method() instead

According to https://www.openssl.org/docs/man1.1.0/man3/SSLv23_method.html:

Use of these functions is deprecated. They have been replaced with
the above TLS_method(), TLS_server_method() and TLS_client_method() 
respectively. New code should use those functions instead.



--
To view, visit http://gerrit.cloudera.org:8080/12420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264
Gerrit-Change-Number: 12420
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: hector.aco...@cloudera.com 
Gerrit-Comment-Date: Tue, 12 Feb 2019 01:08:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add support for compiling using OpenSSL 1.1

2019-02-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12420 )

Change subject: Add support for compiling using OpenSSL 1.1
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@76
PS1, Line 76: #else
:   return TLS1_2_VERSION;
: #endif
: }
Why this change here ? Is there anything wrong with keeping the 
SSLv23_method()->version? Is it not defined or something ?

There seems to be some complication with the logic on RHEL related to detecting 
the version of the linked OpenSSL library: 
(https://github.com/apache/impala/commit/a3c0fffa1275c1d1d2770b9fc35885a22dc7edce)


http://gerrit.cloudera.org:8080/#/c/12420/1/be/src/util/openssl-util.cc@123
PS1, Line 123: EVP_CIPHER_CTX *
nit: EVP_CIPHER_CTX* ctx;



--
To view, visit http://gerrit.cloudera.org:8080/12420
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaccf1b2dedf0d957a2665df8f9afca4139754264
Gerrit-Change-Number: 12420
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 11 Feb 2019 22:39:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-02-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:02:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8154: Disable Kerberos auth to local setting

2019-02-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12405 )

Change subject: IMPALA-8154: Disable Kerberos auth_to_local setting
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12405/1/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/12405/1/be/src/rpc/authentication.cc@791
PS1, Line 791: FLAGS_use_system_auth_to_local = false;
An alternative would be to change the default value at the definition of this 
flag. That said, we never really supported auth_to_local mapping and honestly 
speaking, don't see the need for it for internal communication between Impalads.



--
To view, visit http://gerrit.cloudera.org:8080/12405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8
Gerrit-Change-Number: 12405
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Fri, 08 Feb 2019 18:27:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS

2019-02-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12388 )

Change subject: IMPALA-8170: [DOCS] Added a section on load balancing proxy 
with TLS
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12388
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92185b456623e08841bc27b5dbbe09ace99294aa
Gerrit-Change-Number: 12388
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 08 Feb 2019 07:45:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8064: Improve observability of wait times for runtime filters

2019-02-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12401 )

Change subject: IMPALA-8064: Improve observability of wait times for runtime 
filters
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG@10
PS1, Line 10: Previously, the wait time was calculated with
: respect to the ScanNode::Init() function while duration was logged
: with respect to the ScanNode::WaitForRuntimeFilters() function.
The filter wait time counts against the elapsed time since the filter's 
registration in ScanNode::Init() while duration logged in 
ScanNode::WaitForRuntimeFilters() is the time spent in the function waiting for 
all filters to arrive. This could be misleading as it doesn't account for the 
elapsed time spent between ScanNode::Init() and 
ScanNode::WaitForRuntimeFilters().


http://gerrit.cloudera.org:8080/#/c/12401/1//COMMIT_MSG@18
PS1, Line 18:
to


http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/exec/scan-node.cc@179
PS1, Line 179: max_wait_time = max(max_wait_time, ctx.filter->time_elapsed());
If I understand it correctly, the point of these changes is to reflect the fact 
that "FLAGS_runtime_filter_wait_time_ms" is essentially the maximum tolerable 
elapsed time between the filter's registration and arrival time, which 
potentially is longer than the elapsed time spent in this function.

While I agree it's useful to show the maximum elapsed time between registration 
and arrival for all filters, it seems that the original wait time here is also 
useful for helping us understand the actual wall clock time spent in this 
function waiting for filters to arrive. The time spent in this function 
directly adds to the delay on issuing the initial scan ranges and thus how fast 
we return rows.

So, what do you think about keeping the original wait time while calling the 
max elapsed time between registration and arrival "max arrival delay".


http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@79
PS1, Line 79:   /// Returns the amount of time waited since registration for 
the filter to
, in ms,


http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@81
PS1, Line 81: int32_t
Either set this to int64_t or convert time_elapsed() below to return int32_t. I 
suppose int32_t can tolerate delays up to 24 days.


http://gerrit.cloudera.org:8080/#/c/12401/1/be/src/runtime/runtime-filter.h@86
PS1, Line 86: time
in millisecond



--
To view, visit http://gerrit.cloudera.org:8080/12401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28fd45e75c773bc01d424f5a179ae186ee9b7469
Gerrit-Change-Number: 12401
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Comment-Date: Fri, 08 Feb 2019 07:37:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8154: Disable Kerberos auth to local setting

2019-02-07 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12405


Change subject: IMPALA-8154: Disable Kerberos auth_to_local setting
..

IMPALA-8154: Disable Kerberos auth_to_local setting

Before KRPC, the local name mapping was done from the principal name entirely.
With KRPC, Impala started to use the system auth_to_local rules as the Kudu
security code has "--use_system_auth_to_local=true" by default. This can cause
regression if local auth is configured in the krb5.conf (e.g. with  SSSD with 
AD)
as we started enforcing authorization based on Kerberos principal after this
commit 
(https://github.com/apache/impala/commit/5c541b960491ba91533712144599fb3b6d99521d)

This change fixes the problem by explicitly setting 
FLAGS_use_system_auth_to_local
to false during initialization.

Testing done: Enabled auth_to_local in a Kerberized cluster to map 
"impala/"
to foobar and verified queries still worked as expected.

Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8
---
M be/src/rpc/authentication.cc
1 file changed, 5 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/12405/1
--
To view, visit http://gerrit.cloudera.org:8080/12405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0b0ad79b56cd5cdd3108c6f973e71a9416efbac8
Gerrit-Change-Number: 12405
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 


[Impala-ASF-CR] IMPALA-8170: [DOCS] Added a section on load balancing proxy with TLS

2019-02-07 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12388 )

Change subject: IMPALA-8170: [DOCS] Added a section on load balancing proxy 
with TLS
..


Patch Set 2:

Will do.


--
To view, visit http://gerrit.cloudera.org:8080/12388
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I92185b456623e08841bc27b5dbbe09ace99294aa
Gerrit-Change-Number: 12388
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 07 Feb 2019 21:34:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust

2019-02-06 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12049 )

Change subject: IMPALA-4555: Make QueryState's status reporting more robust
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Gerrit-Change-Number: 12049
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 07 Feb 2019 02:46:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust

2019-02-06 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12049 )

Change subject: IMPALA-4555: Make QueryState's status reporting more robust
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py@158
PS5, Line 158: # Since the sleep time (1000ms) is much longer than the rpc 
timeout (100ms), all
 : # reports will appear to fail. The query is designed to 
result in many intermediate
 : # status reports but fewer than the max allowed failures, so 
the query should succeed.
 : query_options = {'debug_action': 
'REPORT_EXEC_STATUS_DELAY:SLEEP@1000'}
> Sure, happy to write such a test if you think there's additional value in i
Thanks for pointing out the insert errors below. LGTM.



--
To view, visit http://gerrit.cloudera.org:8080/12049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Gerrit-Change-Number: 12049
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 07 Feb 2019 02:46:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-02-06 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 7: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@695
PS6, Line 695: This may be because the port specified is a thrift port, which"
 : " :shutdown() can no longer use.
> I'm not sure what you mean here. Yes we can't be sure that the reason we ca
"Using thrift port" is a subset of "using the wrong port".  The case of "using 
thrift port" is true iff port == FLAGS_be_port which apparently isn't part of 
the if-stmt here so it's not precise to say so in the error message.

Anyhow, it's just a minor point so it's not worth dwelling on it. Please feel 
free to ignore the suggestion.


http://gerrit.cloudera.org:8080/#/c/12260/7/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12260/7/be/src/service/client-request-state.cc@691
PS7, Line 691: request.__isset.backend && request.backend.port
nit: factor this out common expression as it's used on line 639 above



--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 07 Feb 2019 02:35:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4555: Make QueryState's status reporting more robust

2019-02-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12049 )

Change subject: IMPALA-4555: Make QueryState's status reporting more robust
..


Patch Set 5: Code-Review+2

(7 comments)

LGTM. Please address the remaining comments.

http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc@259
PS5, Line 259: instance_status->set_report_seq_no(report_seq_no_);
Can you please a comment on why we don't advance the seq number here to help 
readers understand it ?


http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/fragment-instance-state.cc@300
PS5, Line 300: on in
typo ?


http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@421
PS5, Line 421:   /// Returns the amount of time to wait before sending the next 
status report, calculated
in milliseconds


http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@422
PS5, Line 422:  exponential
This isn't correct, right ?


http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.h@424
PS5, Line 424: int64_t GetReportWaitTimeMs();
int64_t GetReportWaitTimeMs() const;


http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12049/5/be/src/runtime/query-state.cc@415
PS5, Line 415: FLAGS_status_report_interval_ms
A minor quirk I just noticed. If the user sets FLAGS_status_report_interval_ms 
to 0, we may want to have a minimum sleep time. Otherwise, if we failed to send 
the last report, we may retry without sleeping.

I guess my earlier suggestion to use the same timeout may have led to this 
quirk but the current timeout logic still seems simpler overall.


http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

http://gerrit.cloudera.org:8080/#/c/12049/5/tests/custom_cluster/test_rpc_timeout.py@158
PS5, Line 158: # Since the sleep time (1000ms) is much longer than the rpc 
timeout (100ms), all
 : # reports will appear to fail. The query is designed to 
result in many intermediate
 : # status reports but fewer than the max allowed failures, so 
the query should succeed.
 : query_options = {'debug_action': 
'REPORT_EXEC_STATUS_DELAY:SLEEP@1000'}
This is a good test. Can you please add a test which exercises the stateful 
reporting part by running some UDFs which may generate errors 
probabilistically. (e.g. as a scan predicate which calls the UDF). In this 
case, we will also start accumulating errors in the fragment instance state.



--
To view, visit http://gerrit.cloudera.org:8080/12049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Gerrit-Change-Number: 12049
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 05 Feb 2019 23:11:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-02-05 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/runtime/coordinator-backend-state.cc@445
PS6, Line 445: 10.0L
10 should be sufficient. It will be implicitly casted to double.


http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@692
PS6, Line 692: msg.find("RemoteShutdown() RPC failed: Timed out: connection 
negotiation to")
 : != string::npos
Do you think it makes sense to spill out this message iff 
"request.__isset.backend && request.backend.port != 0" ?


http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/client-request-state.cc@695
PS6, Line 695: This may be because the port specified is a thrift port, which"
 : " :shutdown() can no longer use.
Will it be more precise to say as we didn't match the port specified (if any) 
is actually a thrift port number, right ?

"This may be because the port specified is wrong."


http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h@71
PS6, Line 71: class
?


http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.h@72
PS6, Line 72: class
?


http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@84
PS4, Line 84: MonoDelta krpc_timeout
> My experience in the Java world with TimeUnit is that using a type like Mon
Fair point.

The coding convention in Impala code (e.g. see StartShutdown()) tends to be 
that we specific time with a suffix of the time unit instead of passing 
MonoDelta object around. It seems more consistent with the existing code to use 
timeout_ms or timeout_s instead.


http://gerrit.cloudera.org:8080/#/c/12260/4/be/src/service/control-service.h@92
PS4, Line 92:   if (!rpc_status.ok()) continue;
> I added the TODO, thanks. I didn't add the actual sleep as that was what we
Sure, please address those questions in IMPALA-8143.


http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12260/3/be/src/service/control-service.cc@170
PS3, Line 170:   FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES);
> OK, how about I promise to do this in IMPALA-8143? I can see some other pro
Please add a TODO IMPALA-8143 as a reminder.


http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12260/6/be/src/service/control-service.cc@98
PS6, Line 98: kudu::rpc::RpcContext
Not your change but can you please replace "kudu::rpc::RpcContext" to 
RpcContext given the "use kudu::rpc::RpcContext" above ?



--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 05 Feb 2019 19:40:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7565: Set TAcceptQueueServer connection setup pool to be multi-threaded by default

2019-02-01 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12249 )

Change subject: IMPALA-7565: Set TAcceptQueueServer connection_setup_pool to be 
multi-threaded by default
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I053120d4c3153ddbe5261acd28388be6cd191908
Gerrit-Change-Number: 12249
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Sat, 02 Feb 2019 01:32:22 +
Gerrit-HasComments: No


<    1   2   3   4   5   6   7   8   9   10   >