[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-10 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called MinMaxAvgValueCounter which keeps
a track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.
This is also displayed in the RuntimeProfile.

The RuntimeProfile has also been updated to keep a track of, display
and move this new MinMaxAvgValueCounter between nodes through Thrift.

A test has been added to test that this counter works fine when there
are multiple blocks to scan per node.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
7 files changed, 216 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads

2016-09-10 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4371

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
..

IMPALA-3823: Add timer to measure Parquet footer reads

It's been observed that Parquet footer reads perform poorly especially
when reading from S3. This patch adds a timer "FooterProcessingTimer"
which keeps a track of the average time each split of each scan node
spends in reading and processing the parquet footer.

Added a new utility counter called MinMaxAvgValueCounter which keeps
a track of the min, max and average values seen so far from a set of
values. This counter is used to calculate the min, max and average
time taken to scan and process Parquet footers per query per node.
This is also displayed in the RuntimeProfile.

The RuntimeProfile has also been updated to keep a track of, display
and move this new MinMaxAvgValueCounter between nodes through Thrift.

A test has been added to test that this counter works fine when there
are multiple blocks to scan per node.

Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/RuntimeProfile.thrift
M tests/query_test/test_scanners.py
7 files changed, 216 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool

2016-09-10 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3201: reservation implementation for new buffer pool
..


Patch Set 18: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool

2016-09-10 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3201: reservation implementation for new buffer pool
..


IMPALA-3201: reservation implementation for new buffer pool

This patch implements the reservation bookkeeping logic for the new
buffer pool. The reservations are always guaranteed and any buffer/page
allocations require a reservation. Reservations are tracked via a
hierarchy of ReservationTrackers.

Eventually ReservationTrackers should become the main mechanism for
memory accounting, once all runtime memory is handled by the buffer
pool. In the meantime, query/process limits are enforced and memory
is reported through the preexisting MemTracker hierarchy.

Reservations are accounted as consumption against a MemTracker because
the memory is committed and cannot be used for other purposes. The
MemTracker then uses the counters from the ReservationTracker to log
information about buffer-pool memory down to the operator level.

Testing:
Includes basic tests for buffer pool client registration and various
reservation operations.

Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6
Reviewed-on: http://gerrit.cloudera.org:8080/3993
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/CMakeLists.txt
A be/src/bufferpool/CMakeLists.txt
A be/src/bufferpool/buffer-pool-test.cc
A be/src/bufferpool/buffer-pool.cc
M be/src/bufferpool/buffer-pool.h
A be/src/bufferpool/reservation-tracker-counters.h
A be/src/bufferpool/reservation-tracker-test.cc
A be/src/bufferpool/reservation-tracker.cc
A be/src/bufferpool/reservation-tracker.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
A be/src/util/dummy-runtime-profile.h
M be/src/util/runtime-profile-counters.h
13 files changed, 1,439 insertions(+), 26 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-10 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 5:

(9 comments)

Under what circumstances is more than one zookeeper needed?

http://gerrit.cloudera.org:8080/#/c/4348/5//COMMIT_MSG
Commit Message:

PS5, Line 11: The original script was problematic for a couple of reasons (most
: notably because it queried the wrong Zookeeper node) so we stopped
: using it during our mini-cluster HBase start up procedure.
Note quite right. I didn't use it because it wasn't clear it was needed 
anymore, the so-called "HBase race" had been fixed. I didn't notice the master 
vs. rs discrepancy at that time.


http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

PS5, Line 50: parser.add_argument('--timeout', '-t', action='store'
I find action='store' to be redundant since that's the default.


PS5, Line 52:   'Default is 30 seconds.'))
Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS.


PS5, Line 56:   'e.g, 0.0.0.0:5070. Default is 
localhost:5070.'))
Use a format string and read from HDFS_HOST.


PS5, Line 60:   'e.g, 0.0.0.0:2181. Default is 
localhost:2181.'))
Use a format string and read from ZK_HOSTS.


PS5, Line 66:   '/hbase/master and /hbase/rs.')
Use a format string. Maybe use '-n '.join(HBASE_NODES)  as part of it?


PS5, Line 127: hdfs_client.list('/')
What does this do?


PS5, Line 125: try:
 : hdfs_client = InsecureClient('http://' + args.hdfs_host)
 : hdfs_client.list('/')
 : except requests.exceptions.ConnectionError as e:
 : msg = 'Could not connect to HDFS web host http://{0} - 
{1}'.format(args.hdfs_host, e)
 : LOGGER.error(msg)
 : sys.exit(1)
 : 
 : zk_client = connect_to_zookeeper(args.zookeeper_hosts, 
args.timeout)
 : errors = sum([check_hbase_node(zk_client, node, 
args.timeout) for node in args.nodes])
I suggest you make this a method. The __main__ can get the arguments, the 
errors, and print the error and exit accordingly. It would be better to have a 
separate method to make a connection and do the query work.


PS5, Line 135: if errors:
It would be better to have the explicit numerical > 0 comparison.


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool

2016-09-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3201: reservation implementation for new buffer pool
..


Patch Set 18: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/3993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter

2016-09-10 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT 
FROM in HiveSqlWriter
..


Patch Set 2:

Note to committer: there's no GVO path for this, so I suggest you submit along 
with +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter

2016-09-10 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT 
FROM in HiveSqlWriter
..


Patch Set 2: Code-Review+1

This needs a committer's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.

2016-09-10 Thread Anonymous Coward (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4309

to look at the new patch set (#3).

Change subject: IMPALA-4028: Trim sentry config file path spaces while impala 
start.
..

IMPALA-4028: Trim sentry config file path spaces while impala start.

When sentry config file path is wrong input which end contains spaces, 
impala start up failed.

Trimming sentry config file path spaces and checking file again while
the original file has been checked failed will solve this problem.

Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
---
M fe/src/main/java/com/cloudera/impala/authorization/SentryConfig.java
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/4309/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: davy...@163.com


[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps

2016-09-10 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4111: backend death tests should not produce minidumps
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4353/3/be/src/util/minidump.h
File be/src/util/minidump.h:

PS3, Line 29:  for death test
nit: maybe remove or reword like "for example during death test" or "during 
tests"


-- 
To view, visit http://gerrit.cloudera.org:8080/4353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes