[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2842/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 19 Apr 2019 06:10:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#4).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
18 files changed, 1,706 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/4
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipco

[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12973 )

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..

IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

This patch enables a user that has access to the impalad process,
to initiate the graceful shutdown process with a deadline of one year
by sending SIGRTMIN signal to it.
Sample usage: "kill -SIGRTMIN "

Testing:
Added relevant e2e tests.
Tested on CentOS 6, CentOS 7, Ubuntu 16.04, Ubuntu 18.04 and SLES 12

Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Reviewed-on: http://gerrit.cloudera.org:8080/12973
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/common/init.cc
M be/src/common/init.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M tests/custom_cluster/test_restart_services.py
5 files changed, 124 insertions(+), 11 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 11
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12973 )

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 10
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 05:00:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4865: Reject Expr Rewrite When Appropriate

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12814 )

Change subject: IMPALA-4865: Reject Expr Rewrite When Appropriate
..


Patch Set 9:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4042/


--
To view, visit http://gerrit.cloudera.org:8080/12814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b078113ccc1aa49b0cea0c86dff2e02e1dd0e23
Gerrit-Change-Number: 12814
Gerrit-PatchSet: 9
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 03:30:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2841/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 19 Apr 2019 03:04:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Hello Thomas Marshall, Lars Volker, David Rorke, Sahil Takiar, Todd Lipcon, Tim 
Armstrong, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12987

to look at the new patch set (#3).

Change subject: IMPALA-8341: Data cache for remote reads
..

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done: a new BE test was added; core test with cache enabled.

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/io/CMakeLists.txt
A be/src/runtime/io/data-cache-test.cc
A be/src/runtime/io/data-cache.cc
A be/src/runtime/io/data-cache.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-file-reader.h
M be/src/runtime/io/request-context.h
M be/src/util/filesystem-util-test.cc
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M bin/start-impala-cluster.py
M common/thrift/metrics.json
18 files changed, 1,706 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12987/3
--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipco

[Impala-ASF-CR] IMPALA-8341: Data cache for remote reads

2019-04-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@88
PS2, Line 88: consolidatin
> consolidating
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@89
PS2, Line 89: sparse
> sparse
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@166
PS2, Line 166:
 :   /// Utility function to verify that all partitions' 
consumption don't exceed their
 :   /// quotas. Retur
> clang-tidy failure: struct instead of class.
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.h@184
PS2, Line 184:
> empty
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@296
PS2, Line 296:   meta_cache_->Erase(key);
 :   return true;
 : }
 :   }
> TODO: Add a metric for this.
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/be/src/runtime/io/data-cache.cc@343
PS2, Line 343:
> insertion_offset + bytes_written
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@117
PS2, Line 117:
> flake8: E703 statement ends with a semicolon
Done


http://gerrit.cloudera.org:8080/#/c/12987/2/bin/start-impala-cluster.py@120
PS2, Line 120:
> flake8: E703 statement ends with a semicolon
Done



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 19 Apr 2019 02:20:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13057 )

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state.h@377
PS1, Line 377: this
nit: this is



--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 01:42:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13057 )

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2840/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 01:34:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Tim Armstrong (Code Review)
Hello Thomas Marshall, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13057

to look at the new patch set (#3).

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..

IMPALA-8270: fix MemTracker teardown in FeSupport

This patch tries to simplify and standardise the order
in which control structures are torn down. As a
consequence the bug is fixed. I've described the bug
below.

The changes are:
* Make more control structures owned directly by
  QueryState::obj_pool_, so that they are all
  destroyed at the same time via ~QueryState.
* Tear down local_query_state_ explicitly before
  other destructors run.

Either change is sufficient to fix the bug, but
I preferred to do both  to reduce the chances
of similar bugs in future.

Description of bug:
===
In the normal query execution flow:
- RuntimeState is in QueryState::obj_pool_
- RuntimeState owns RuntimeState::instance_mem_tracker_ via scoped_ptr
- QueryState::query_mem_tracker_ is in QueryState::obj_pool_
- QueryState::query_mem_tracker_ has a reference to
  RuntimeState::instance_mem_tracker_
The tear-down works because ~QueryState unregisters query_mem_tracker_
from its parent, making the whole subtree unreachable before destroying
QueryState::obj_pool_. It is thus safe to destroy instance_mem_tracker_
along with the rest of obj_pool_.

FeSupport messes this up by having RuntimeState own the QueryState
RuntimeState::local_query_state_ via a scoped_ptr, and the implied
destructor order means that RuntimeState::instance_mem_tracker_ is
destroyed before RuntimeState::local_query_state_, which breaks the
above flow and the destroyed instance_mem_tracker_ is reachable from
the process MemTracker via QueryState::query_mem_tracker_ for a
small window until it is unregistered.

Testing:
Added a backend test that reproduced the ASAN use-after-free
failure when run against unmodified RuntimeState code. I did
not make it a unified backend test so that it would be easier
to backport this fix to older versions that don't have unified
tests.

Change-Id: If815130cd4db00917746f10b28514f779ee254f0
---
M be/src/runtime/CMakeLists.txt
A be/src/runtime/runtime-state-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
4 files changed, 92 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/13057/3
--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13057 )

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/2839/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 01:07:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 22: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4044/


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 22
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 19 Apr 2019 00:45:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13057 )

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13057/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13057/1//COMMIT_MSG@42
PS1, Line 42: local_query_state_
> Should this be instance_mem_tracker_?
Done


http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state-test.cc
File be/src/runtime/runtime-state-test.cc:

http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state-test.cc@43
PS1, Line 43: TEST
> I think this should be TEST_F? (Or use TEST and remove the RuntimeStateTest
Done



--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 00:24:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Tim Armstrong (Code Review)
Hello Thomas Marshall, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13057

to look at the new patch set (#2).

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..

IMPALA-8270: fix MemTracker teardown in FeSupport

This patch tries to simplify and standardise the order
in which control structures are torn down. As a
consequence the bug is fixed. I've described the bug
below.

The changes are:
* Make more control structures owned directly by
  QueryState::obj_pool_, so that they are all
  destroyed at the same time via ~QueryState.
* Tear down local_query_state_ explicitly before
  other destructors run.

Either change is sufficient to fix the bug, but
I preferred to do both  to reduce the chances
of similar bugs in future.

Description of bug:
===
In the normal query execution flow:
- RuntimeState is in QueryState::obj_pool_
- RuntimeState owns RuntimeState::instance_mem_tracker_ via scoped_ptr
- QueryState::query_mem_tracker_ is in QueryState::obj_pool_
- QueryState::query_mem_tracker_ has a reference to
  RuntimeState::instance_mem_tracker_
The tear-down works because ~QueryState unregisters query_mem_tracker_
from its parent, making the whole subtree unreachable before destroying
QueryState::obj_pool_. It is thus safe to destroy instance_mem_tracker_
along with the rest of obj_pool_.

FeSupport messes this up by having RuntimeState own the QueryState
RuntimeState::local_query_state_ via a scoped_ptr, and the implied
destructor order means that RuntimeState::instance_mem_tracker_ is
destroyed before RuntimeState::local_query_state_, which breaks the
above flow and the destroyed instance_mem_tracker_ is reachable from
the process MemTracker via QueryState::query_mem_tracker_ for a
small window until it is unregistered.

Testing:
Added a backend test that reproduced the ASAN use-after-free
failure when run against unmodified RuntimeState code. I did
not make it a unified backend test so that it would be easier
to backport this fix to older versions that don't have unified
tests.

Change-Id: If815130cd4db00917746f10b28514f779ee254f0
---
M be/src/runtime/CMakeLists.txt
A be/src/runtime/runtime-state-test.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
4 files changed, 92 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/13057/2
--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12973 )

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 10
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 00:06:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12973 )

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..


Patch Set 9: Code-Review+2

Updated Commit message with list of OSes that I tested this change on. Carrying 
over Lars's +2


--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 00:06:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12973 )

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4045/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 10
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 19 Apr 2019 00:06:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

2019-04-18 Thread Bikramjeet Vig (Code Review)
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12973

to look at the new patch set (#9).

Change subject: IMPALA-8401: SIGRTMIN initiates the graceful shutdown process
..

IMPALA-8401: SIGRTMIN initiates the graceful shutdown process

This patch enables a user that has access to the impalad process,
to initiate the graceful shutdown process with a deadline of one year
by sending SIGRTMIN signal to it.
Sample usage: "kill -SIGRTMIN "

Testing:
Added relevant e2e tests.
Tested on CentOS 6, CentOS 7, Ubuntu 16.04, Ubuntu 18.04 and SLES 12

Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
---
M be/src/common/init.cc
M be/src/common/init.h
M be/src/service/impala-server.cc
M be/src/service/impalad-main.cc
M tests/custom_cluster/test_restart_services.py
5 files changed, 124 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/12973/9
--
To view, visit http://gerrit.cloudera.org:8080/12973
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I521ffd7526ac9a8a5c4996994eb68d6a855aef86
Gerrit-Change-Number: 12973
Gerrit-PatchSet: 9
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8270: fix MemTracker teardown in FeSupport

2019-04-18 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13057 )

Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13057/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13057/1//COMMIT_MSG@42
PS1, Line 42: local_query_state_
Should this be instance_mem_tracker_?


http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state-test.cc
File be/src/runtime/runtime-state-test.cc:

http://gerrit.cloudera.org:8080/#/c/13057/1/be/src/runtime/runtime-state-test.cc@43
PS1, Line 43: TEST
I think this should be TEST_F? (Or use TEST and remove the RuntimeStateTest 
class)



--
To view, visit http://gerrit.cloudera.org:8080/13057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0
Gerrit-Change-Number: 13057
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:56:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13003 )

Change subject: IMPALA-8407: Warn when Impala shell fails to connect due to 
tlsv1.2
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/13003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Gerrit-Change-Number: 13003
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:19:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13003 )

Change subject: IMPALA-8407: Warn when Impala shell fails to connect due to 
tlsv1.2
..

IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2

When impala-shell is used to connect to an impala cluster with
--ssl_minimum_version=tlsv1.2, if the Python version being used is
< 2.7.9 the connection will fail due to a limitation of TSSLSocket.
See IMPALA-6990 for more details.

Currently, when this occurs, the error that gets printed is "EOF
occurred in violation of protocol", which is not very helpful. This
patch detect this situation and prints a more informative warning.

Testing:
- Updated test_tls_v12 so that instead of being skipped on affected
  platforms, it runs and checks for the presence of the warning.

Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Reviewed-on: http://gerrit.cloudera.org:8080/13003
Reviewed-by: Thomas Marshall 
Tested-by: Impala Public Jenkins 
---
M shell/impala_shell.py
M tests/custom_cluster/test_client_ssl.py
2 files changed, 10 insertions(+), 3 deletions(-)

Approvals:
  Thomas Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/13003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Gerrit-Change-Number: 13003
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 22: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 22
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:06:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4044/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 22
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:06:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Anurag Mantripragada (Code Review)
Anurag Mantripragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 21:

This python file generates the templates for hive-site.xml - Every mini-cluster 
will have this config set. This is just like the event processing flags are set.


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 23:03:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2838/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 22:56:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Bharath Krishna (Code Review)
Bharath Krishna has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 21:

How does this work for the tests in the MetastoreEventsProcessorTest , does it 
need dml.events = true as well?


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 22:41:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY

2019-04-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13060 )

Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc@154
PS1, Line 154: DEFINE_string(debug_actions, "", "For testing only. Uses the 
same format as the debug "
 : "action query options, but allows for injection of debug 
actions in code paths where "
 : "query options are not available.");
> I could see the argument either way.
I think this is fine as-is. No point in dwelling on it for too much as long as 
it achieves the purpose.



--
To view, visit http://gerrit.cloudera.org:8080/13060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Gerrit-Change-Number: 13060
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 22:26:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Anurag Mantripragada (Code Review)
Anurag Mantripragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 21:

Build failed as the HMS flag that enables insert notification was not set by 
default. Added hive.metastore.dml.events=true config in hive-site.xml.py


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 22:13:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Anurag Mantripragada (Code Review)
Anurag Mantripragada has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..

IMPALA-7971: Add support for insert events in event processor.

This patch adds support for detecting and processing insert events
triggered by impala as well as external engines (eg.Hive).

Inserts from Impala will fire an insert event notification.
Using this event, event-processor will refresh table/partition.
Both insert into and overwrite are supported for tables/partitions.

Known Issues:
1. Inserts into tables from Hive are ignored by the event processor
   as these inserts create an ALTER event first followed by an
   INSERT event. The alter will invalidate table making the refresh
   a no-op. Insert into partitions from hive will create an INSERT
   event first followed by an ALTER event. In this case, there is
   an unnecessary table invalidate after a refresh.
2. Existing self-events logic cannot be used for insert events since
   firing insert event does not allow us to modify table parameters in
   HMS. This means we cannot get the CatalogServiceIdentifiers in insert
   events. Therefore, the event-processor will also refresh the tables
   for which insert operation is performed through Impala.

Testing:
1. Added new custom cluster tests to run different insert commands from
hive and verified new data is available in Impala without invalidate
metadata.

2. Added a test in MetastoreEventsProcessor for testing insert events.

Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
---
M be/src/service/client-request-state.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/resources/hive-site.xml.py
A tests/custom_cluster/test_event_processing.py
10 files changed, 576 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/12889/21
--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 21
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8419 : Fetch metastore configuration values to detect misconfigured setups

2019-04-18 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13019 )

Change subject: IMPALA-8419 : Fetch metastore configuration values to   
detect misconfigured setups
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@222
PS4, Line 222: FIRE_EVENTS_FOR_DML
We probably also need to make sure that the parameters filter config does not 
filter out keys like impala.events.catalogVersion, 
impala.events.catalogServiceId and impala.disableHmsSync


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@315
PS4, Line 315: try {
> Do you mean to use try-with-resource in the above method which calls using
I see. May be you can then create another method called 
getMetastoreConfig(String key, String defaultval) here. This method can 
implement it using try-with-resource as I suggested above. In the test, you can 
then use Mockito.spy to return the dummy values when this method is called. 
Something like

MetastoreStoreEventsProcessor spyEventsProcessor = 
Mockito.spy(eventsProcessor_);
doReturn("testValue").when(spyProcessor.getMetastoreConfig());


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@123
PS4, Line 123:   public static String getMetastoreConfigValue(
> I feel this is useful too when users can just compare the return value with
A more common pattern (and generic too) is to let the consumers decide what 
should be the default value if the configuration is not present. For example, 
for a boolean configuration users can choose to do getConfig(key, "false) 
whereas for a String configuration they can do getConfig(key, ""); If you avoid 
hard-coding the default value assumption it make it more usable for all the 
consumers.


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@245
PS4, Line 245: toggleBooleanValueString
code assumes that config values will always be booleans. Its probably easier to 
just pass some dummy. Also since there are only a few configurations, it 
probably easier to just have multiple statements of

when(getConfig(key1)).thenReturn(val1);
when(getConfig(key1)).thenReturn(val1);
when(getConfig(key1)).thenReturn(val1);

and get rid of the loop this way there is no need to have the cleanup logic 
too. All this can be done by creating a simple util method which takes in 
validateConfig(key, mockValue, shouldSucceed). You can test both the positive 
and negative cases using this util method.



--
To view, visit http://gerrit.cloudera.org:8080/13019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606
Gerrit-Change-Number: 13019
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 21:48:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13065 )

Change subject: IMPALA-8293 (Part 1): Move SentryProxy out of 
CatalogServiceCatalog
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2837/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
Gerrit-Change-Number: 13065
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2019 20:55:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

2019-04-18 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13065 )

Change subject: IMPALA-8293 (Part 1): Move SentryProxy out of 
CatalogServiceCatalog
..

IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

The patch refactors the SentryProxy, which does Sentry cache
invalidation from CatalogServiceCatalog into a Sentry-specific
implementation, i.e.  SentryCatalogdAuthorizationManager. This patch
also adds a new method in the AuthorizationManager interface to allow
refreshing authorization for any authorization provider that caches the
authorization metadata.

This patch also contains minor clean-up to address comments in this
abandoned CR: https://gerrit.cloudera.org/c/12748

This patch has no functionality change.

Testing:
- Ran all FE tests
- Ran all E2E authorization tests

Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
---
A fe/src/main/java/org/apache/impala/authorization/AuthorizationDelta.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationFactory.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationProvider.java
R fe/src/main/java/org/apache/impala/authorization/NoopAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryCatalogdAuthorizationManager.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/authorization/sentry/SentryProxyTest.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java
29 files changed, 329 insertions(+), 188 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/13065/3
--
To view, visit http://gerrit.cloudera.org:8080/13065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
Gerrit-Change-Number: 13065
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8149 : Add support for alter database events

2019-04-18 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13049 )

Change subject: IMPALA-8149 : Add support for alter_database events
..


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/13049/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13049/1//COMMIT_MSG@9
PS1, Line 9: add
typo: adds


http://gerrit.cloudera.org:8080/#/c/13049/1//COMMIT_MSG@8
PS1, Line 8:
   : This change add support for alter_database events in two parts:
   : One is adding catalogServiceId and catalogVersion in db parameters 
when alter database.
   : The other is adding alter database event, check if it's self event 
during process, if true do nothing, if false replace caralog cached db with 
event db.
nit: try to keep it within 72 characters


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@767
PS1, Line 767:* @param tblName table name
> We need to update the comment to explain that it can remove version for bot
+1. I'm also not quite clear why we need to remove the version for in-flight 
events when tblName == null. Same as above for getInFlightVersionsForEvents.


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/Db.java@521
PS1, Line 521: ==
nit: >= is probably better


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@263
PS1, Line 263: protected List pendingVersionNumbersFromCatalog_ = 
Collections.EMPTY_LIST;
make it final?


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@272
PS1, Line 272: event.getTableName()
can event.getTableName() be  null? If not, we should do 
Preconditions.checkNotNull(event.getTableName()) similar to L271


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@405
PS1, Line 405:   if (versionNumberFromEvent_ == -1 || 
pendingVersionNumbersFromCatalog_.isEmpty())
 : return false;
nit: if it spans across multiple lines, use {}


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@408
PS1, Line 408: if (catalog_.getCatalogServiceId().equals(serviceIdFromEvent_)) {
 : // service id is a match. Now check if the event version 
is what we expect
 : // in the list
 : if 
(pendingVersionNumbersFromCatalog_.get(0).equals(versionNumberFromEvent_)) {
can be combined with && to reduce the nestedness


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@439
PS1, Line 439: protected void initSelfEventIdentifiersFromEvent() {
 :   throw new UnsupportedOperationException("Please override 
this method in subclass");
 : }
making it an abstract is better since we get compile-time vs runtime error.


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@445
PS1, Line 445: if (params == null)
This code is a bit weird that "params" can be null. Usually it should be an 
empty map instead.


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@370
PS1, Line 370: break;
nit: put it in the new line


http://gerrit.cloudera.org:8080/#/c/13049/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3805
PS1, Line 3805: Preconditions.checkState(dbName != null && 
!dbName.isEmpty(),
> I think this check is redundant, as it is done inside getDb() call below?
checkState is also not quite correct. It's more like checkArgument.



--
To view, visit http://gerrit.cloudera.org:8080/13049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf020e85cae04163bf32e31363eb4119d624640b
Gerrit-Change-Number: 13049
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Revie

[Impala-ASF-CR] IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13065 )

Change subject: IMPALA-8293 (Part 1): Move SentryProxy out of 
CatalogServiceCatalog
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2836/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
Gerrit-Change-Number: 13065
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 18 Apr 2019 18:50:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13060 )

Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2835/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/13060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Gerrit-Change-Number: 13060
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 18:35:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

2019-04-18 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13065


Change subject: IMPALA-8293 (Part 1): Move SentryProxy out of 
CatalogServiceCatalog
..

IMPALA-8293 (Part 1): Move SentryProxy out of CatalogServiceCatalog

The patch refactors the SentryProxy, which does Sentry cache
invalidation from CatalogServiceCatalog into a Sentry-specific
implementation, i.e.  SentryCatalogdAuthorizationManager. This patch
also adds a new method in the AuthorizationManager interface to allow
refreshing authorization for any authorization provider that caches the
authorization metadata.

This patch also contains minor clean-up to address comments in this
abandoned CR: https://gerrit.cloudera.org/c/12748

This patch has no functionality change.

Testing:
- Ran all FE tests
- Ran all E2E authorization tests

Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
---
A fe/src/main/java/org/apache/impala/authorization/AuthorizationDelta.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationFactory.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationProvider.java
R fe/src/main/java/org/apache/impala/authorization/NoopAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryCatalogdAuthorizationManager.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/authorization/sentry/SentryProxyTest.java
M fe/src/test/java/org/apache/impala/catalog/AlterDatabaseTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java
29 files changed, 323 insertions(+), 186 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/13065/2
--
To view, visit http://gerrit.cloudera.org:8080/13065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3fa14abb09abfb4aaf6231d35114c0e121d6e568
Gerrit-Change-Number: 13065
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-8293: Support for Ranger cache invalidation

2019-04-18 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12748 )

Change subject: IMPALA-8293: Support for Ranger cache invalidation
..


Patch Set 6:

I'm abandoning this CR because it contains refactor + new feature, which makes 
the review difficult. I'll split this work into 2 CRs instead. Some of the 
comments will be addressed in the new CRs.


-- 
To view, visit http://gerrit.cloudera.org:8080/12748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I524ee93d09077dd4ff3d18fe517739b7776d01d7
Gerrit-Change-Number: 12748
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 18 Apr 2019 18:00:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8293: Support for Ranger cache invalidation

2019-04-18 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has abandoned this change. ( http://gerrit.cloudera.org:8080/12748 
)

Change subject: IMPALA-8293: Support for Ranger cache invalidation
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/12748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I524ee93d09077dd4ff3d18fe517739b7776d01d7
Gerrit-Change-Number: 12748
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13003 )

Change subject: IMPALA-8407: Warn when Impala shell fails to connect due to 
tlsv1.2
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4043/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/13003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Gerrit-Change-Number: 13003
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Apr 2019 18:01:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8407: Warn when Impala shell fails to connect due to tlsv1.2

2019-04-18 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13003 )

Change subject: IMPALA-8407: Warn when Impala shell fails to connect due to 
tlsv1.2
..


Patch Set 3: Code-Review+2

carrying forward


--
To view, visit http://gerrit.cloudera.org:8080/13003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3feddaccb9be3a15220ce9e59aa7ed41d41b8ab6
Gerrit-Change-Number: 13003
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Apr 2019 18:00:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY

2019-04-18 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13060 )

Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/common/global-flags.cc@154
PS1, Line 154: DEFINE_string(debug_actions, "", "For testing only. Uses the 
same format as the debug "
 : "action query options, but allows for injection of debug 
actions in code paths where "
 : "query options are not available.");
> If set, should this also affect the default value of debug action query opt
I could see the argument either way.

I chose to do it this way because this way the flag and the query option are 
used for disjoint sets of labels - it seems like it would be more confusing to 
me if some debug actions can be set either with the flag or the query option 
but others can only be set with the flag. But admittedly its confusing this way 
as well.

Another option I considered was naming the flag something like 
"rpc_debug_actions", though I decided against it in case we want to use this 
functionality to add more non-rpc debug actions without query options in the 
future. Of course, if that happens we could just deal with it then.

Maybe we can come up with a name to the effect of 
"non_query_specific_debug_actions" but less clunky, like "system_debug_actions"?


http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/service/data-stream-service.cc@97
PS1, Line 97: EndDataStreamResponsePB* response, RpcContext* rpc_context) {
> May wanna add a delay here too ?
Done


http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/util/debug-util.h
File be/src/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/13060/1/be/src/util/debug-util.h@144
PS1, Line 144: Status DebugActionImpl(const string& debug_action, const char* 
label) WARN_UNUSED_RESULT;
> or FLAGS_debug_action
Done


http://gerrit.cloudera.org:8080/#/c/13060/1/tests/custom_cluster/test_rpc_timeout.py
File tests/custom_cluster/test_rpc_timeout.py:

http://gerrit.cloudera.org:8080/#/c/13060/1/tests/custom_cluster/test_rpc_timeout.py@129
PS1, Line 129:
> flake8: E502 the backslash is redundant between brackets
Done



--
To view, visit http://gerrit.cloudera.org:8080/13060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Gerrit-Change-Number: 13060
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 17:59:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8138: Remove FAULT INJECTION RPC DELAY

2019-04-18 Thread Thomas Marshall (Code Review)
Hello Michael Ho, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13060

to look at the new patch set (#2).

Change subject: IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY
..

IMPALA-8138: Remove FAULT_INJECTION_RPC_DELAY

This patch removes the FAULT_INJECTION_RPC_DELAY macro and replaces
its uses with DebugAction which is more flexible. For example, it
supports JITTER which injects random delays.

Every backend rpc has a debug action of the form RPC_NAME_DELAY.

DebugAction has previously always been used via query options.
However, for the rpcs considered here there is not always a query with
an accessible TQUeryOptions available (for example, we do not send any
query info with the RemoteShutdown rpc), so this patch introduces a
flag, '--debug_actions', which is used to control these rpc delay
debug actions.

Testing:
- Updated existing tests to use the new mechanism.

Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
---
M be/src/common/global-flags.cc
M be/src/service/control-service.cc
M be/src/service/data-stream-service.cc
M be/src/service/impala-internal-service.cc
M be/src/testutil/fault-injection-util.cc
M be/src/testutil/fault-injection-util.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M tests/custom_cluster/test_rpc_timeout.py
9 files changed, 54 insertions(+), 83 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/13060/2
--
To view, visit http://gerrit.cloudera.org:8080/13060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I712b188e0cdf91f431c9b94052501e5411af407b
Gerrit-Change-Number: 13060
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 


[native-toolchain-CR] Enable reusing ccache directories.

2019-04-18 Thread Hector Acosta (Code Review)
Hector Acosta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12996 )

Change subject: Enable reusing ccache directories.
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12996/1/functions.sh
File functions.sh:

http://gerrit.cloudera.org:8080/#/c/12996/1/functions.sh@543
PS1, Line 543:   local EXPIRES="$(date -d '+3 months' --utc 
+'%Y-%m-%dT%H:%M:%SZ')"
> wrap at 90 chars, if possible
Done


http://gerrit.cloudera.org:8080/#/c/12996/1/functions.sh@545
PS1, Line 545: ON="
> nit: whitespace
Done


http://gerrit.cloudera.org:8080/#/c/12996/3/functions.sh
File functions.sh:

http://gerrit.cloudera.org:8080/#/c/12996/3/functions.sh@519
PS3, Line 519: local TAR=ccache.tar
> Handling of the tarball filename is asymmetric between download_ccache() an
This was done following the same logic for the download_dependency function and 
the publishing logic in the build_dist_package function.


It looks like $S3_BUCKET is only used for uploads, which makes sense in some 
scenarios: This code expects a very specific layout in some urls which would 
need to be replicated even if we simply want to upload packages to an 
alternative bucket. The same can be said about the ccache tarball. Tying the 
bucket that gets used to upload artifacts to the bucket that's used to download 
dependencies makes things harder in some, admittedly specific use cases.


http://gerrit.cloudera.org:8080/#/c/12996/1/init.sh
File init.sh:

http://gerrit.cloudera.org:8080/#/c/12996/1/init.sh@34
PS1, Line 34: #  - DOWNLOAD_CCACHE
> DOWNLOAD_CCACHE
Done


http://gerrit.cloudera.org:8080/#/c/12996/1/init.sh@133
PS1, Line 133:
> Could you add a (brief) comment about this, including mentioning UPLOAD_CCA
Done



--
To view, visit http://gerrit.cloudera.org:8080/12996
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I482aa13e833d4680efe7cab98aad7f4fb998bfc0
Gerrit-Change-Number: 12996
Gerrit-PatchSet: 3
Gerrit-Owner: Hector Acosta 
Gerrit-Reviewer: Hector Acosta 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Thu, 18 Apr 2019 17:42:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4865: Reject Expr Rewrite When Appropriate

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12814 )

Change subject: IMPALA-4865: Reject Expr Rewrite When Appropriate
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4042/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/12814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b078113ccc1aa49b0cea0c86dff2e02e1dd0e23
Gerrit-Change-Number: 12814
Gerrit-PatchSet: 9
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Apr 2019 17:30:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2834/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2019 15:42:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages

2019-04-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG@30
PS14, Line 30: Testing
> I looked into test_scanners_fuzz.py, and noticed there is no query with WHE
Do you want to open a Jira for that?


http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc@639
PS14, Line 639: if (state_->query_options().parquet_read_page_index) {
> It is not useful to read the page index if there are no suitable predicates
Good catch. Done.



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2019 15:05:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages

2019-04-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Michael Ho, Lars Volker, Pooja Nilangekar, Tim Armstrong, Csaba 
Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12065

to look at the new patch set (#15).

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
..

IMPALA-5843: Use page index in Parquet files to skip pages

This commit implements page filtering based on the Parquet page index.

The read and evaluation of the page index is done by the
HdfsParquetScanner. At first, we determine the row ranges we are
interested in, and based on the row ranges we determine the candidate
pages for each column that we are reading.

We still issue one ScanRange per column chunk, but we specify
sub-ranges that store the candidate pages, i.e. we don't read
the whole column chunk, but only fractions of it.

Pages are not aligned across column chunks, i.e. page #2 of column A
might store completely different rows than page #2 of column B.
It means we need to implement some kind of row-skipping logic
when we read the data pages. This logic is implemented in
BaseScalarColumnReader and ScalarColumnReader. Collection column
readers know nothing about page filtering.

Page filtering can be turned off by setting the query option
'read_parquet_page_index' to false.

Testing:
 * added some unit tests for the row range and
   page selection logic
 * generated various Parquet files with Parquet-MR
 * enabled Page index writing and wrote selective queries against
   tables written by Impala. Current tests are likely to use page
   filtering transparently.

Performance:
 * Measured locally, observed 3x to 20x speedup for selective queries.
   The speedup was proportional to the IO operations need to be done.

 * The TPCH benchmark didn't show a significant performance change. It
   is not a suprise since the data is not being sorted in any useful
   way. So the main goal was to not introduce perf regression.

TODO:
   * measure performance for remote reads

Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
A be/src/exec/parquet/parquet-common-test.cc
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-level-decoder.h
A be/src/exec/parquet/parquet-page-index-test.cc
A be/src/exec/parquet/parquet-page-index.cc
A be/src/exec/parquet/parquet-page-index.h
M be/src/exprs/literal.cc
M be/src/runtime/scoped-buffer.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M testdata/data/README
A testdata/data/alltypes_tiny_pages.parquet
A testdata/data/alltypes_tiny_pages_plain.parquet
A testdata/data/decimals_1_10.parquet
A testdata/data/double_nested_decimals.parquet
A testdata/data/nested_decimals.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test
A testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M tests/query_test/test_parquet_stats.py
36 files changed, 3,392 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/12065/15
--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5843: Use page index in Parquet files to skip pages

2019-04-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: IMPALA-5843: Use page index in Parquet files to skip pages
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12065/14//COMMIT_MSG@30
PS14, Line 30: Testing
I looked into test_scanners_fuzz.py, and noticed there is no query with WHERE 
clause at all. This means that we can be sure that some parts of the page index 
logic are not tested with corrupted parquet files. This also means holes in the 
testing of existing logic, e.g. row group level min/max stats were also not 
exercised.

I am ok with moving this task to a follow up Jira.


http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12065/14/be/src/exec/parquet/hdfs-parquet-scanner.cc@639
PS14, Line 639: if (state_->query_options().parquet_read_page_index) {
It is not useful to read the page index if there are no suitable predicates for 
min/max filtering ( == if min_max_conjunct_evals_ is empty ).



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Apr 2019 10:27:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8430: Fix flakiness in testCreateDropCreateDatabase

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13058 )

Change subject: IMPALA-8430: Fix flakiness in testCreateDropCreateDatabase
..

IMPALA-8430: Fix flakiness in testCreateDropCreateDatabase

The test fails because of two Databases getting created with
same CREATION_TIME. Hence, adding a sleep of 2 seconds to
avoid this case. Also fixing other tests with similar use-case.

Testing
 - Fixed the unit tests

Change-Id: I30bf4535d54c9cd8d257b528dc7a1b42f106800d
Reviewed-on: http://gerrit.cloudera.org:8080/13058
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
2 files changed, 40 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/13058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I30bf4535d54c9cd8d257b528dc7a1b42f106800d
Gerrit-Change-Number: 13058
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8430: Fix flakiness in testCreateDropCreateDatabase

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13058 )

Change subject: IMPALA-8430: Fix flakiness in testCreateDropCreateDatabase
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/13058
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30bf4535d54c9cd8d257b528dc7a1b42f106800d
Gerrit-Change-Number: 13058
Gerrit-PatchSet: 5
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 10:05:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7971: Add support for insert events in event processor.

2019-04-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12889 )

Change subject: IMPALA-7971: Add support for insert events in event processor.
..


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4040/


--
To view, visit http://gerrit.cloudera.org:8080/12889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c48c5ca4bde18d532c582980aebbc25f1bf1c52
Gerrit-Change-Number: 12889
Gerrit-PatchSet: 20
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 07:19:56 +
Gerrit-HasComments: No