[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8146 )

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h@175
PS13, Line 175:   /// Materialize the var-len string data in this tuple into 
the provided memory
  :   /// pool.
> the name and the comment didn't make it clear that this also rewrites this
Renamed to CopyStrings(). Agree it's more accurate.



--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 05:22:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-10-26 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Thomas Tauber-Marshall, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8146

to look at the new patch set (#14).

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..

IMPALA-5307: Part 2: copy out strings in uncompressed Avro

The approach is to re-materialize strings in those tuples that
survive conjunct evaluation and may reference disk I/O buffers
directly. This means that perf should not regress for the
following cases:
* Compressed Avro files.
* Non-string columns.
* Selective scans where the majority of tuples are filtered out.

This approach will also work for the Sequence and Text scanners.

Includes some improvements to Avro codegen to replace more constants to
help win back some performance (with limited success): replaced
InitTuple() with an optimised version and substituted
tuple_byte_size() with a constant.

Removes dead code for handling CHAR(n) - CHAR(n) is now always fixed
length.

Perf:
Did microbenchmarks on uncompressed Avro files, one with all columns
from lineitem and one with only l_comment. Tests were run with:
  set num_scanner_threads=1;

I ran the query 5 times and extracted MaterializeTupleTime from the
profile to measure CPU cost of materialization. Overall string
materialization got significantly slower, mainly because of the
extra memcpy() calls required.

Selecting one string from a table with multiple columns:

  select min(l_comment) from biglineitem_avro
  1.814 -> 2.096

Selecting one string from a table with one column:

  select min(l_comment) from biglineitem_comment; profile;
  1.708 -> 3.7

Selecting one string from a table with one column with predicate:

  select min(l_comment) from biglineitem_comment where length(l_comment) > 
1;
  1.691 -> 1.449

Selecting all columns:

  select min(l_orderkey), min(l_partkey), min(l_suppkey), min(l_linenumber),
min(l_quantity), min(l_extendedprice), min(l_discount), min(l_tax),
min(l_returnflag), min(l_linestatus), min(l_shipdate),
min(l_commitdate), min(l_receiptdate), min(l_shipinstruct),
min(l_shipmode), min(l_comment) from biglineitem_avro; profile;
  2.335 -> 3.711

Selecting an int column (no strings):
  select min(l_linenumber) from biglineitem_avro
  1.806 -> 1.819

Testing:
Ran exhaustive tests.

Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
A be/src/runtime/tuple-ir.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
19 files changed, 337 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8146/14
--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8303

to look at the new patch set (#13).

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..

IMPALA-1575: Part 1: eagerly release query exec resources

Release of backend resources for query execution on the coordinator
daemon should occur eagerly as soon as query execution is finished
or cancelled. Before this patch some resources managed by QueryState,
like scratch files, were only released when the query was closed
and all of the query's control structures were torn down.

These resources are referred to as "ExecResources" in various places to
distinguish them from resources associated with the client request (like
the result cache) that are still required after the query finishes
executing.

This first change does not solve the admission control problem for two
reasons:
* We don't release the "admitted" memory on the coordinator until
  the query is unregistered.
* Admission control still considers the memory reserved until the
  query memtracker is unregistered, which happens only when the
  QueryState is destroyed: see MemTracker::GetPoolMemReserved().

The flow is mostly similar to initial_reservation_refcnt_, except the
coordinator also holds onto a reference count, which is released
when either the final row is returned or cancellation is initiated.
After the coordinator releases its refcount, the resources can be
freed as soon as local fragments also release their refcounts.

Also clean up Coordinator slightly by preventing runtime_state() from
leaking out to ClientRequestState - instead it's possible to log
the query MemTracker by following parent links in the MemTracker.

This patch is partially based on Joe McDonnell's IMPALA-1575 patch.

Testing:
Ran core tests.

Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
---
M be/src/common/status.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M be/src/service/client-request-state.cc
11 files changed, 154 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/13
--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@328
PS11, Line 328: RuntimeState* state
> definitely not this change, but should we just get rid of this parameter. d
I think the status propagation is a lot more reliable that it used to be. One 
potential advantage of logging it is if we hit the memory limit in multiple 
places around the same time we would get more diagnostics in the log instead of 
whichever error won the relate. Unclear if that's useful or important.

We could also consider making the memtracker dump in the status more concise - 
it's pretty verbose now.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@359
PS11, Line 359: was not available
> does that comment still make sense? isn't the query_tracker==nullptr case n
We do call it on the process MemTracker in QueryState::Init().


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@205
PS11, Line 205: backend
> what do we mean by "backend" here?  is that the same thing as "execution"?
Yeah I don't think that word added anything. Removed.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@206
PS11, Line 206: Resources for this query
> How about: Query wide resources ...
Done


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@258
PS11, Line 258: backend
> same question
Done


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@95
PS11, Line 95:   if (!released_exec_resources_) ReleaseExecResources();
> we've been trying to move away from doing resource releasing from destructo
My initial concern was that handling it in Init() would get messy in the case 
when the coordinator was already holding a resource refcount. I came up with a 
reasonable solution - acquiring the refcount always and then releasing it on 
the error path.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@122
PS11, Line 122:   TExecQueryFInstancesParams& non_const_params =
I needed to remove this declaration because of the goto crossing the 
declaration.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc@88
PS11, Line 88:   local_query_state_->AcquireExecResourceRefcount(); // 
Decremented in ReleaseResources().
> why is that needed? what resource needs to be held by the runtime state in
The resources used for evaluating exprs, etc - this is used in various test 
environments and in fe-support. We could also expose 
QueryState::ReleaseExecResources() and call it directly, but it seemed better 
to use the same API here as in the main query execution flow.



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 04:39:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8303

to look at the new patch set (#12).

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..

IMPALA-1575: Part 1: eagerly release query exec resources

Release of backend resources for query execution on the coordinator
daemon should occur eagerly as soon as query execution is finished
or cancelled. Before this patch some resources managed by QueryState,
like scratch files, were only released when the query was closed
and all of the query's control structures were torn down.

These resources are referred to as "ExecResources" in various places to
distinguish them from resources associated with the client request (like
the result cache) that are still required after the query finishes
executing.

This first change does not solve the admission control problem for two
reasons:
* We don't release the "admitted" memory on the coordinator until
  the query is unregistered.
* Admission control still considers the memory reserved until the
  query memtracker is unregistered, which happens only when the
  QueryState is destroyed: see MemTracker::GetPoolMemReserved().

The flow is mostly similar to initial_reservation_refcnt_, except the
coordinator also holds onto a reference count, which is released
when either the final row is returned or cancellation is initiated.
After the coordinator releases its refcount, the resources can be
freed as soon as local fragments also release their refcounts.

Also clean up Coordinator slightly by preventing runtime_state() from
leaking out to ClientRequestState - instead it's possible to log
the query MemTracker by following parent links in the MemTracker.

This patch is partially based on Joe McDonnell's IMPALA-1575 patch.

Testing:
Ran core tests.

Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
---
M be/src/common/status.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M be/src/service/client-request-state.cc
11 files changed, 150 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/12
--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-10-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..


Patch Set 1:

Ooops, canceled


--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8404 )

Change subject: IMPALA-5017: Error on decimal overflow
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1402/


--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1400/


--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 02:44:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow

2017-10-26 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8404


Change subject: IMPALA-5017: Error on decimal overflow
..

IMPALA-5017: Error on decimal overflow

Before this patch, decimal operations would either silently overflow (in
the case of sum() and avg()), or produce a warning.

In this patch, the beharior is changed so that an error is produced in
the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behavior is
unchanged.

We introduce overflow checks when computing sum() and avg(). This
results in a ~30% performance regression when we are in decimal v2 mode
compared to decimal v1.

Benchmarks:
  Query:
  select sum(dec_38_19) from decimal_tbl
Decimal v1: 13.09s
Decimal v2: 17.10s

  Query:
  select avg(dec_38_19) from decimal_tbl
Decimal v1: 13.60s
Decimal v2: 19.10s

Testing:
- Added several end to end tests.
- Updated Expr tests to check for error in case of overflow.

Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr-test.cc
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
M testdata/workloads/functional-query/queries/QueryTest/decimal.test
6 files changed, 186 insertions(+), 69 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/1
--
To view, visit http://gerrit.cloudera.org:8080/8404
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019
Gerrit-Change-Number: 8404
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-10-26 Thread Bikramjeet Vig (Code Review)
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7822

to look at the new patch set (#7).

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.

Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.

Performance:
Initial perf testing using tpcds_1000 shows no regression.

Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
18 files changed, 633 insertions(+), 283 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/7
--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-10-26 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc@207
PS6, Line 207: InternalType, parquet::Type::type PARQUET_TYPE
> you should explain these template parameters
Done


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@40
PS6, Line 40: IMPALA_TO_PARQUET_TYPES
I think it makes sense to rename this to INTERNAL_TO_PARQUET_TYPES to be 
consistent with our naming of template parameters.


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@88
PS6, Line 88: InternalType
> let's explain what that is
Done


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@177
PS6, Line 177:   static int DecodeByParquetType(const uint8_t* buffer, const 
uint8_t* buffer_end, int fixed_len_size,
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@333
PS6, Line 333: inline int EncodeDecimal(const T& v, int fixed_len_size, 
uint8_t* buffer){
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@34
PS6, Line 34: int
> int32_t for consistency?
Done


http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@79
PS6, Line 79: LOGICAL_TYPE
> LogicalType here and below
Done



--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 6
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Oct 2017 00:32:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 27 Oct 2017 00:19:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..

IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

Impala currently kinits by forking off a child process. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.

This patch adds a flag called 'use_kudu_kinit' that defaults to
true. When it's true, it uses the Kudu security library's kinit code
that programatically uses the krb5 library to kinit.
When it's false, we run our current path which kicks off the
kinit-thread and forks off a kinit process periodically to reacquire
tickets based on FLAGS_kerberos_reinit_interval.

Converted existing tests in thrift-server-test to run with and
without kerberos. We now run this BE test with kerberos by using
Kudu's MiniKdc utility. This introduces a new dependency on some
kerberos binaries that are checked through FindKerberosPrograms.cmake.
Note that this is only a test dependency and not a dependency for
the impalad binaries and friends. Compilation will still succeed if
the kerberos binaries for the MiniKdc are not found, however, the
thrift-server-test will fail. We run with and without the
'use_kudu_kinit' flag.

TODO: Since the setting up and tearing down of our security code
isn't idempotent, we can run only any one test in a process with
Kerberos now (IMPALA-6085).

Updated bin/bootstrap_system.sh to install new sasl-gssapi
modules and the kerberos binaries required for the MiniKdc.
Also fixed a bug that didn't transfer the environment into 'sudo'
in bin/bootstrap_system.sh.

Testing: Verified with thrift-server-test and also manually on a
live kerberized cluster.

Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Reviewed-on: http://gerrit.cloudera.org:8080/7938
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M CMakeLists.txt
M be/src/exec/kudu-util.h
M be/src/kudu/security/CMakeLists.txt
M be/src/kudu/security/test/mini_kdc.cc
M be/src/rpc/CMakeLists.txt
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-server-test.cc
M bin/bootstrap_system.sh
A cmake_modules/FindKerberosPrograms.cmake
10 files changed, 234 insertions(+), 21 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 15
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro

2017-10-26 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8146 )

Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h@175
PS13, Line 175:   /// Materialize the var-len string data in this tuple into 
the provided memory
  :   /// pool.
the name and the comment didn't make it clear that this also rewrites this 
tuple. i.e. in that sense, this is different than MaterializeExprs(). In this 
case, the slots (and strings) are already materialized -- it's re-materializing 
(or copying) the strings, right?



--
To view, visit http://gerrit.cloudera.org:8080/8146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233
Gerrit-Change-Number: 8146
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:46:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@328
PS11, Line 328: RuntimeState* state
definitely not this change, but should we just get rid of this parameter. do we 
really need to explicitly do the LogError() below? If we still rely on that 
then it seems we should fix status propagation.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@359
PS11, Line 359: was not available
does that comment still make sense? isn't the query_tracker==nullptr case now 
mean that it's part of the query hierarchy? though should we even call this 
function in that case?


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@205
PS11, Line 205: backend
what do we mean by "backend" here?  is that the same thing as "execution"?


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@206
PS11, Line 206: Resources for this query
How about: Query wide resources ...

since some FIS-local resources are also resources used for this query but those 
should be owned and released solely by the FIS thread.


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@258
PS11, Line 258: backend
same question


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@95
PS11, Line 95:   if (!released_exec_resources_) ReleaseExecResources();
we've been trying to move away from doing resource releasing from destructors, 
to make the lifetimes super clear.  shouldn't this case just be handled in the 
caller of Init() when !status.ok() (which already has to handle releasing the 
QS ref count, and the thread creation path already handles this case. we could 
similarly release the resource ref count if we acquired that sooner before Init 
can fail).


http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc@88
PS11, Line 88:   local_query_state_->AcquireExecResourceRefcount(); // 
Decremented in ReleaseResources().
why is that needed? what resource needs to be held by the runtime state in this 
case?



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:26:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5607: Add additional units to EXTRACT, DATE PART, TRUNC

2017-10-26 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8311 )

Change subject: IMPALA-5607: Add additional units to EXTRACT, DATE_PART, TRUNC
..


Patch Set 2:

Looks like we want NANOSECONDS after all, so the BIGINT change is needed.


--
To view, visit http://gerrit.cloudera.org:8080/8311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If23ea310ddcc5c1c806b2b1ac24d24a4bd6aa4d9
Gerrit-Change-Number: 8311
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:24:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1400/


--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:15:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:15:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5607: Add additional units to EXTRACT, DATE PART, TRUNC

2017-10-26 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8311 )

Change subject: IMPALA-5607: Add additional units to EXTRACT, DATE_PART, TRUNC
..


Patch Set 2:

(6 comments)

Overall this looks good - and thank you for the tests.  It is worth debating 
whether NANOSECONDs should be supported at all - let's do that in the JIRA.

http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/expr-test.cc@5959
PS2, Line 5959:   TestStringValue(
I don't think the additional test cases add any value unless you add greater 
precision, i.e.

"cast(trunc(cast('2012-09-10 07:02:03.1234' as timestamp), 'SS') as string)"


http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc
File be/src/exprs/udf-builtins-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@120
PS2, Line 120:   return time.fractional_seconds() + time.seconds() * 1000 * 
1000 * 1000;
> The seconds field, including fractional parts, multiplied by 1 000 000 000;
Use NANOS_PER_SEC


http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@125
PS2, Line 125:   return ExtractNanosecond(time) / 1000;
Instead of dividing the result of another function, these should be implement 
directly, i.e.

  time.fractional_seconds() / NANOS_PER_MICRO + time.seconds() * MICROS_PER_SEC


http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@130
PS2, Line 130:   return ExtractMicrosecond(time) / 1000;
same comment applies


http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@154
PS2, Line 154: BigIntVal
> Return type should be changed from INT to BIGINT because INT data type's ra
We only need to change this if we actually implement nanosecond units for 
EXTRACT; no other referenced system in the JIRA currently supports that, so it 
is worth debating whether we should or not.


http://gerrit.cloudera.org:8080/#/c/8311/2/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

http://gerrit.cloudera.org:8080/#/c/8311/2/common/thrift/Exprs.thrift@92
PS2, Line 92:   MILLISECONDS,
We shouldn't have redundant field definitions here; instead we should be 
lenient with parsing and convert MILISECONDS to MILISECOND.



--
To view, visit http://gerrit.cloudera.org:8080/8311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If23ea310ddcc5c1c806b2b1ac24d24a4bd6aa4d9
Gerrit-Change-Number: 8311
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 26 Oct 2017 23:13:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h@482
PS4, Line 482:   bool output_null_aware_probe_rows_running_;
> Longer-term we should think about ways to reduce the sheer number of statef
Yeah many of the members are just used by one state. I think we should 
summarize the states into enums or put the stateful members into 
std::variant/boost::variant altogether.


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@949
PS4, Line 949: RowBatch null_build_batch(child(1)->row_desc(), 
state->batch_size(), mem_tracker());
> This can fit on one line.
Done


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@962
PS4, Line 962:   RETURN_IF_CANCELLED(state);
> We should add a RETURN_IF_CANCELLED() check here, similar to the one below
Done


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@1074
PS4, Line 1074:   RowBatch build_batch(child(1)->row_desc(), 
state->batch_size(), mem_tracker());
> This can fit on one line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:55:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tianyi Wang (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8226

to look at the new patch set (#5).

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..

IMPALA-2758: Remove BufferedTupleStream::GetRows

This patch removes BufferedTupleStream::GetRows. This function pins a
stream and reads all the rows into a single batch. It is not a good API
since it creates an arbitrarily large row batch. In this patch the call
sites pin the stream and then directly use GetNext to retrieve a single
batch at a time.

Testing: It passes existing tests. A test case for GetRows is removed.

Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
5 files changed, 63 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8226/5
--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 11: Code-Review+1

(1 comment)

Thanks for the explanations. This LGTM.

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095
PS10, Line 1095: query_state_ != nullptr)
> Sure, so what you are saying is that query_state_ being not null is not an
Makes sense. Better not to make decisions based on non-contractual current 
implementation details that may change at any point. Seems like we have the 
same pattern for a lot of other objects too.



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:49:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes

2017-10-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8196 )

Change subject: IMPALA-4236: Codegen CopyRows() for select nodes
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py
File tests/query_test/test_codegen.py:

http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py@52
PS9, Line 52: assert_codegen_enabled(result.runtime_profile, [1])
> Do we already have tests that check for codegen failed?
I will try something which involves a char column.



--
To view, visit http://gerrit.cloudera.org:8080/8196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e
Gerrit-Change-Number: 8196
Gerrit-PatchSet: 11
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:48:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 4: Code-Review+2

(4 comments)

Looks good, just did a final pass and caught a couple more things. If you fix 
those and rebase then I can start the merge.

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h@482
PS4, Line 482:   bool output_null_aware_probe_rows_running_;
Longer-term we should think about ways to reduce the sheer number of stateful 
member variables in this class and make the state machine in GetNext() easier 
to understand.


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@949
PS4, Line 949: RowBatch null_build_batch(child(1)->row_desc(), 
state->batch_size(),
This can fit on one line.


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@962
PS4, Line 962:   null_build_batch.Reset();
We should add a RETURN_IF_CANCELLED() check here, similar to the one below (I 
think this was an oversight in an earlier patch).


http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@1074
PS4, Line 1074:   RowBatch build_batch(child(1)->row_desc(), 
state->batch_size(),
This can fit on one line.



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:22:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095
PS10, Line 1095: query_state_ != nullptr)
> But if the query_state_ were null, then we wouldn't release the resources,
That would require that query_state_ was not initialised but  we acquired a 
resource refcount. With the code as it is now, it's trivial to convince 
ourselves that it's not possible.

I can remove the null check but my concern was that it's a brittle assumption 
that there are no failure paths where a reference to the query_state_ is not 
obtained. I had to look at the implementation of two functions across two files 
to convince myself that it was true.

It seems more like to me that code would move around to invalidate the second 
assumption than the first.



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:14:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8401 )

Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml
File docs/topics/impala_ssl.xml:

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@158
PS1, Line 158: tlsv1: Allow any TLS version of 1.0 
or higher.
mention this the default may be?



--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:06:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095
PS10, Line 1095: query_state_ != nullptr)
> Your reasoning appears to be correct. My tendency is to keep the null check
But if the query_state_ were null, then we wouldn't release the resources, 
which in itself is a bug. So, wouldn't it be better to know that such a bug 
existed?



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 22:05:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 11: Code-Review+1

(4 comments)

Thanks for the comments - I had to think hard about each one of them.

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095
PS10, Line 1095: query_state_ != nullptr)
> How would query_state_ be NULL if the coordinator hasn't release its refere
Your reasoning appears to be correct. My tendency is to keep the null check 
though - usually unnecessary checks like this just add confusion but on cleanup 
logic I find it's a good practice to write defensive code like this since code 
tends to change, it's hard to reason about the different possible failure modes 
and test coverage isn't always complete.


I'm open to removing it though, I think it would need a comment to explain why 
it's valid though.


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc@322
PS10, Line 322: while (tracker != nullptr && !tracker->is_query_mem_tracker_) {
  : tracker = tracker->parent_;
  :   }
> What's the thread safety story here? What if a any one of the trackers touc
The parent_ pointer is never modified so the only race possible is if 
intervening MemTrackers are destroyed. This is not possible since the 
MemTracker hierarchy is now all torn down at once when the query ends.

The thing about the parent pointer isn't documented anywhere so I made the 
pointer const and added a comment.


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@96
PS10, Line 96:   if (query_mem_tracker_ != nullptr) {
 : // Disconnect the query MemTracker hierarchy from the global 
hierarchy. After this
 : // point nothing must touch this query's MemTracker and all 
tracked memory associated
 : // with the query must be released. The whole query subtree 
of MemTrackers can
 : // therefore be safely destroyed.
 :
> I'm still not clear why this moved from ReleaseExecResources() to ~QuerySta
It happens at the same time as before - previously ReleaseResources() ran 
immediately before the destructor. After this patch everything in 
ReleaseExecResources() can happen much sooner before the destructor. However 
this one action of disconnecting the query MemTracker couldn't be moved earlier.

I tried to improve the comment a bit to emphasise that this was disconnecting a 
control structure.

I also thought about making a separate Close() method that did this kind of 
teardown, but it seemed unnecessary given that it would run immediately before 
the destructor. I.e.

  query_state->Close();
  delete query_state


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@135
PS10, Line 135: initial_reservations_->Init(query_id(), 
rpc_params.min_reservation_bytes));
> Can we move this to the start of Init() and make L95 a DCHECK?
I like the idea but after trying it out I think it's significantly complicated 
by the fact that the coordinator grabs a resource refcount without calling 
Init() - in some cases when Init() fails the refcount is already non-zero so it 
wouldn't be valid to release the resources yet.



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 21:59:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Tim Armstrong (Code Review)
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8303

to look at the new patch set (#11).

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..

IMPALA-1575: Part 1: eagerly release query exec resources

Release of backend resources for query execution on the coordinator
daemon should occur eagerly as soon as query execution is finished
or cancelled. Before this patch some resources managed by QueryState,
like scratch files, were only released when the query was closed
and all of the query's control structures were torn down.

These resources are referred to as "ExecResources" in various places to
distinguish them from resources associated with the client request (like
the result cache) that are still required after the query finishes
executing.

This first change does not solve the admission control problem for two
reasons:
* We don't release the "admitted" memory on the coordinator until
  the query is unregistered.
* Admission control still considers the memory reserved until the
  query memtracker is unregistered, which happens only when the
  QueryState is destroyed: see MemTracker::GetPoolMemReserved().

The flow is mostly similar to initial_reservation_refcnt_, except the
coordinator also holds onto a reference count, which is released
when either the final row is returned or cancellation is initiated.
After the coordinator releases its refcount, the resources can be
freed as soon as local fragments also release their refcounts.

Also clean up Coordinator slightly by preventing runtime_state() from
leaking out to ClientRequestState - instead it's possible to log
the query MemTracker by following parent links in the MemTracker.

This patch is partially based on Joe McDonnell's IMPALA-1575 patch.

Testing:
Ran core tests.

Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M be/src/service/client-request-state.cc
10 files changed, 129 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/11
--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..

IMPALA-5429: Multi threaded block metadata loading

Implements multi threaded block metadata loading on the Catalog
server where we fetch block metadata for multiple partitions of a
single table in parallel. Number of threads to load the metadata is
controlled by the following two parameters (set on the Catalog server
startup and applies for each table load)

-max_hdfs_partitions_parallel_load(default=5)
-max_nonhdfs_partitions_parallel_load(default=20)

We use different thread pool sizes for HDFS and non-HDFS tables since
non-HDFS supports much higher throughput of RPC calls for listStatus
/listFiles. Based on our experiments, S3 showed a linear speed up
(up to ~113x) with increasing number of loading threads where as the
HDFS throughput was limited to ~5x in un-secure clusters and up to
~3.7x in secure clusters. We narrowed it down to scalability
bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the
server and the client side.

One thing to note here is that the thread pool based metadata fetching
is implemented only for loading HDFS block metadata and not for loading
HMS partition information. Our experiments showed that while loading
large partitioned tables, ~90% of the time is spent in connecting to NN
and loading the HDFS block information and optimizing the rest ~10% makes
the code unnecessarily complex without much gain.

Additional notes:

- The multithreading approach is implemented for
  * INVALIDATE (loading from scratch),
  * REFRESH (reusing existing md) code paths,
  * ALTER TABLE ADD/RECOVER PARTITIONS.

- This patch makes the implementation of ListMap thread-safe since
we use that data structure as a shared state between multiple partition
metadata loding threads.

Testing and Results:

- This patch doesn't add any new tests since there is enough test
coverage already. Passed core/exhaustive runs with HDFS/S3.

- We noticed up to ~113x speedup on S3 tables(thread_pool_size=160)
and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure
HDFS clusters.

- Synthesized the following two large tables on HDFS and S3 and noticed
significant reduction in my test DDL queries.

  (1) 100K partitions + 1 million files
  (2) 80 partitions + 250K files

 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4%
 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25%
 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57%
 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87%
 80-PARTITIONS-250K-FILES-09-INVALIDATE   I -24.88%
 80-PARTITIONS-250K-FILES-03-RECOVER  I -35.90%
 80-PARTITIONS-250K-FILES-07-REFRESH  I -43.03%
 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS  I -43.93%
 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV   I -46.59%
 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION  I -48.71%
 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH   I -49.02%
 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV  I -49.05%
 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87%
 80-PARTITIONS-250K-FILES-S3-03-RECOVER   I -67.17%
 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV   I -76.45%
 80-PARTITIONS-250K-FILES-S3-07-REFRESH   I -87.04%
 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57%

Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Reviewed-on: http://gerrit.cloudera.org:8080/8235
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/ListMap.java
9 files changed, 462 insertions(+), 242 deletions(-)

Approvals:
  Bharath Vissapragada: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 12
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 

[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 11
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Oct 2017 21:56:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8401 )

Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml
File docs/topics/impala_ssl.xml:

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@145
PS1, Line 145: impalad, impalad,
 : and impalad
I assume  you meant catalogd and statestored in 2 of these?



--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 21:41:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8366 )

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 21:32:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8366 )

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..

IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

This commit introduces the compile-time constant READ_SIZE_MIN_VALUE
in the newly created common/global-flags.h

A static_assert checks if READ_SIZE_MIN_VALUE is greater than or equal to
HdfsParquetScanner::FOOTER_SIZE.

Also, moved the definition of read_size flag to global-flags.cc,
and declared it in global-flags.h.

Runtime checks test if read_size is greater than or eaqual to
READ_SIZE_MIN_VALUE. If not, the process is terminated.

Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Reviewed-on: http://gerrit.cloudera.org:8080/8366
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/common/global-flags.cc
A be/src/common/global-flags.h
M be/src/common/init.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/runtime/disk-io-mgr.cc
M be/src/util/backend-gflag-util.cc
6 files changed, 57 insertions(+), 9 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8401 )

Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml
File docs/topics/impala_ssl.xml:

http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@177
PS1, Line 177: using the --ssl_cipher_list configuration 
setting
Elaborate on the default state a little bit: "Default is empty and then Impala 
uses the default cipher list for that platform."



--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 21:16:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 8:

> > Patch Set 7:
 > >
 > > > > Patch Set 7:
 > >  > >
 > >  > > Perf results:
 > >  > > ...
 > >  >
 > >  > I'm surprised that only a few queries saw significant
 > speedups. Is
 > >  > this in line with what you saw with Parquet runtime filters on
 > >  > TPC-H? Or are we losing a lot by using min/max instead of
 > bloom or
 > >  > in-list style filters?
 > >
 > > Not sure about bloom filters perf, though I can run those numbers
 > for comparison.
 >
 > I haven't looked at this patch, but had a question about the
 > design:
 >
 > Are we still pushing blooms across a join to prevent shuffling of
 > data? Or are we now pushing _only_ min/max?
 >
 > It seems there is value in pushing both: the bloom for evaluation
 > on the other side of the join to prevent shuffling, and the min/max
 > to push all the way to the scanner to reduce I/O.
 >
 > Not sure if the patch is already doing this.

Impala only evaluates runtime filters in the scan. Even prior to this patch, 
the Kudu scanner was not evaluating bloom filters (and hash joins with Kudu 
scan targets don't build bloom filters).

It certainly could be useful to evaluate bloom filters on the Impala side of a 
Kudu scan, but I believe our thinking was that it wasn't worth it to implement 
that - better to just wait until bloom filters can be pushed all the way down 
into Kudu. If bloom filters in Kudu are a long way off, though, we should maybe 
reevaluate that.


--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:54:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8401 )

Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..


Patch Set 1:

I added a couple of questions under IMPALA-6065 for things I wasn't sure about.


--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:53:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 8:

(10 comments)

> I noticed that - BloomFilterBytes is always 0 in the query
 > profiles.
 > Can you please veirfy?

Yes. This line is always printed to the profile, even if there are no bloom 
filters being built. We don't expose the mem used by min-max filters in the 
profile as its negligible.

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG@26
PS7, Line 26:
> Might be worth mentioning why the runtime filters were renumbered in all th
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h@106
PS7, Line 106:   Status CloneFrom(const FilterContext& from, ObjectPool* pool, 
RuntimeState* state,
> Long lines
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@140
PS7, Line 140:   virtual void* GetMax() override { return _; }
> Nit: not sure why this is using underscores while AlwaysTrue() is using cam
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@176
PS7, Line 176:   StringValue min_;
> Can you briefly mention what it means when these are empty - that the value
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291
PS6, Line 291: out->__isset.min = true;
> Oh ok, thanks for clarifying. It looked superficially like a last line copy
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@137
PS7, Line 137: DCHECK(thrift.__isset.max);
> Do we have end-to-end tests for these code paths? I think we could generall
I've now added e2e and unit tests for all of the truncation scenarios.


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143: CopyToBuffer(_buffer_, _, max_.len);
> This has a fairly large number of edge cases - it would be good to unit tes
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143: uff
> static_cast instead of int()?
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@150
PS7, Line 150:
> I would have expected that we would modify the trailing values that overflo
Done


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@207
PS7, Line 207: out->min.__set_string_val(in.min.string_val);
> It seems tricky to test these various error paths in end-to-end tests. Coul
As you say, its difficult to test the oom scenario in an e2e test, since the 
max amount of mem being used here is so small, but its covered with a unit test 
now.



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:52:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options

2017-10-26 Thread John Russell (Code Review)
John Russell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8401


Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options
..

IMPALA-5473: [DOCS] Document TLS min version & cipher options

Under the doc JIRA IMPALA-6065.

Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
---
M docs/topics/impala_ssl.xml
1 file changed, 52 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/8401/1
--
To view, visit http://gerrit.cloudera.org:8080/8401
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e
Gerrit-Change-Number: 8401
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Todd Lipcon, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#8).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

In RuntimeFilterGenerator in the planner, each partitioned hash join
generates a bloom and min-max filter, but only those filters that end
up being assigned to a target make it into the final plan.

Min-max filters are only assigned to Kudu scans if the target expr is
a column, as Kudu doesn't support bounds on general exprs, and only if
the join op is '=' and not 'is distinct from', as Kudu doesn't support
returning NULLs if a bound is set.

Min-max filters are inserted into by the PartitionedHashJoinBuilder.
Codegen is used to eliminate branching on the type of filter. String
min-max filters truncate their bounds at 1024 chars, so that the max
amount of memory used by min-max filters is negligible.

For now, min-max filters are only applied at the KuduScanner, which
passes them into the Kudu client.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Functional Testing:
- Added new planner tests and updated the old ones. (in old tests, a
  lot of runtime filters are renumbered as we always generate min-max
  filters even if they don't end up getting assigned and they take up
  some of the RF ids).
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/timestamp-value.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter-test.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/Data.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-26 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 7:

I know that He Lifu at NetEase also prototyped some support for pushing blooms 
down all the way into Kudu (not just the Impala scanner). On his TPCH 
benchmarks he got a ~50% speedup on Q17 and Q18, Q9, Q8, which I don't see in 
the results here.

I wonder if this is due to a different partitioning scheme or whether the big 
gain actually comes from pushing all the way down into the Kudu scanner. Any 
idea, given the plans for those queries?


--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:35:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1398/


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:29:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 14: Code-Review+2

Fix minor clang-tidy issues.

Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:29:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Michael Brown, Jim Apple, Bikramjeet Vig, Dan Hecht, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7938

to look at the new patch set (#14).

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..

IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

Impala currently kinits by forking off a child process. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.

This patch adds a flag called 'use_kudu_kinit' that defaults to
true. When it's true, it uses the Kudu security library's kinit code
that programatically uses the krb5 library to kinit.
When it's false, we run our current path which kicks off the
kinit-thread and forks off a kinit process periodically to reacquire
tickets based on FLAGS_kerberos_reinit_interval.

Converted existing tests in thrift-server-test to run with and
without kerberos. We now run this BE test with kerberos by using
Kudu's MiniKdc utility. This introduces a new dependency on some
kerberos binaries that are checked through FindKerberosPrograms.cmake.
Note that this is only a test dependency and not a dependency for
the impalad binaries and friends. Compilation will still succeed if
the kerberos binaries for the MiniKdc are not found, however, the
thrift-server-test will fail. We run with and without the
'use_kudu_kinit' flag.

TODO: Since the setting up and tearing down of our security code
isn't idempotent, we can run only any one test in a process with
Kerberos now (IMPALA-6085).

Updated bin/bootstrap_system.sh to install new sasl-gssapi
modules and the kerberos binaries required for the MiniKdc.
Also fixed a bug that didn't transfer the environment into 'sudo'
in bin/bootstrap_system.sh.

Testing: Verified with thrift-server-test and also manually on a
live kerberized cluster.

Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
---
M CMakeLists.txt
M be/src/exec/kudu-util.h
M be/src/kudu/security/CMakeLists.txt
M be/src/kudu/security/test/mini_kdc.cc
M be/src/rpc/CMakeLists.txt
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-server-test.cc
M bin/bootstrap_system.sh
A cmake_modules/FindKerberosPrograms.cmake
10 files changed, 234 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7938/14
--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1394/


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 19:34:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..


Patch Set 10:

(4 comments)

Just have a few minor comments.

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095
PS10, Line 1095: query_state_ != nullptr)
How would query_state_ be NULL if the coordinator hasn't release its reference 
to it yet?

We always create a QueryState for a Coordinator and there's no point of failure 
between when a Coordinator object is created and a QueryState for that 
Coordinator object is created.


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc@322
PS10, Line 322: while (tracker != nullptr && !tracker->is_query_mem_tracker_) {
  : tracker = tracker->parent_;
  :   }
What's the thread safety story here? What if a any one of the trackers touched 
here are unregistered from their parent in the middle of this loop?


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@96
PS10, Line 96:   if (query_mem_tracker_ != nullptr) {
 : // After this point nothing should be touching this query's 
MemTracker and all
 : // tracked memory associated with the query must be 
released. The whole query
 : // subtree of MemTrackers can be removed from the global 
tree and destroyed.
 : query_mem_tracker_->CloseAndUnregisterFromParent();
 :   }
I'm still not clear why this moved from ReleaseExecResources() to 
~QueryState(). Doesn't this mean we're releasing the tracked memory later?


http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@135
PS10, Line 135: AcquireExecResourceRefcount(); // Decremented in 
QueryExecMgr::StartQueryHelper().
Can we move this to the start of Init() and make L95 a DCHECK?



--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 18:37:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1397/


--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 11
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Oct 2017 18:08:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 11: Code-Review+2

Rebased, carrying +2. Thanks Dimitris.


--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 11
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 26 Oct 2017 18:07:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-10-26 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 2:

(7 comments)

Thanks for tackling this! As a user, I hate that this was broken.

I think your changes to impala_shell.py look good. I'm a bit more wary of 
having another superclass for tests with helper functions.

http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py
File shell/impala_shell.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py@729
PS2, Line 729:   def _validate_database(self, immediately=False):
Is there a case where you ever want immediately=True? i.e., could we get rid of 
the two paths here and converge to just one?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py
File tests/common/cluster_controller.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33
PS2, Line 33: class ClusterController(CustomClusterTestSuite):
I'm generally not fond of the inheritance and multiple-inheritance in these 
tests. Is this actually distinct from CustomClusterTestSuite?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@64
PS2, Line 64:   def start_cluster_with_args(self, **kwargs):
This could go straight into the super class. It's tightly coupled with 
_start_impala_cluster.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py
File tests/custom_cluster/test_shell_interactive_reconnect.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25
PS2, Line 25: TMP_HISTORY_FILE = os.path.expanduser("~/.impalahistorytmp")
Perhaps use an actual tempfile created with tempfile.[something]


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@27
PS2, Line 27: class TestShellInteractiveReconnect(ClusterController):
Any reason not to add this simply to tests/shell/test_shell_interactive.py?

I think because you need CustomCluster to do the restart? If so, please add a 
comment about this.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@40
PS2, Line 40:   @pytest.mark.execute_serially
I think all CustomCluster tests run serially? This one seems like it could be 
parallel.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@116
PS2, Line 116:   """ Moves history file to given filepath """
The underlying bug here is that the shell doesn't have a historypath option, 
but I think this is fine...



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8368 )

Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..


Patch Set 2:

(5 comments)

Thanks for doing this refactoring, I had some questions about whether we could 
improve it further, but I think the direction looks good.

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py
File tests/common/cluster_controller.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33
PS2, Line 33: class ClusterController(CustomClusterTestSuite):
Woh, nice.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@34
PS2, Line 34: breakpad
Comment needs update. I guess now it's used for all custom cluster tests that 
kill processes and restart the cluster?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@42
PS2, Line 42: self.tmp_dir = tempfile.mkdtemp()
I'm thinking about whether this is factored in the best way and whether the 
additional layer of inheritance is needed. I think a lot of the below utility 
functions for starting the cluster and killing processes could be moved into 
CustomClusterTestSuite.

Other aspects seem specific to the breakpad tests - disabling core dumps, the 
temporary directory manipulation. Maybe it makes sense to have those in a 
Breakpad test base class.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@85
PS2, Line 85:   def kill_processes(self, processes, signal):
kill_processes() and wait_for_all_processes_dead() I think could just be 
standalone utility functions - they aren't specific to cluster tests.


http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py
File tests/shell/util.py:

http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@115
PS2, Line 115: move_history
maybe move_shell_history()/restore_shell_history()? It's more verbose but I 
think is easier to understand at the callsites.



--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 4:

Tim, can you do the +2 review for this one?


--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:49:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@948
PS3, Line 948: DCHECK(got_reservation) << "Should have been pinned";
> Maybe add a DCHECK message like '<< "Should have been pinned"' to explain t
Done


http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@1073
PS3, Line 1073:   DCHECK(got_reservation) << "Should have been pinned";
> Same here
Done



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:44:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tianyi Wang (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8226

to look at the new patch set (#4).

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..

IMPALA-2758: Remove BufferedTupleStream::GetRows

This patch removes BufferedTupleStream::GetRows. This function pins a
stream and reads all the rows into a single batch. It is not a good API
since it creates an arbitrarily large row batch. In this patch the call
sites pin the stream and then directly use GetNext to retrieve a single
batch at a time.

Testing: It passes existing tests. A test case for GetRows is removed.

Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
5 files changed, 64 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8226/4
--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8366 )

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1395/


--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:40:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8366 )

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:40:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8366 )

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:39:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows

2017-10-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows
..


Patch Set 3: Code-Review+1

(2 comments)

Change looks good to me - glad to see the code removed. Can the other reviewers 
take another look (or let me know if they don't intend to)?

http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@948
PS3, Line 948: DCHECK(got_reservation);
Maybe add a DCHECK message like '<< "Should have been pinned"' to explain the 
DCHECK.


http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@1073
PS3, Line 1073:   DCHECK(got_reservation);
Same here



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:36:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 17:06:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Alex Behm (Code Review)
Alex Behm has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Reviewed-on: http://gerrit.cloudera.org:8080/7564
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm 
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 775 insertions(+), 108 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1394/


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 15:46:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 13:25:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

2017-10-26 Thread Zoltan Borok-Nagy (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8366

to look at the new patch set (#3).

Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than 
the footer size
..

IMPALA-1291: Parquet read fails if io buffer size is less than the footer size

This commit introduces the compile-time constant READ_SIZE_MIN_VALUE
in the newly created common/global-flags.h

A static_assert checks if READ_SIZE_MIN_VALUE is greater than or equal to
HdfsParquetScanner::FOOTER_SIZE.

Also, moved the definition of read_size flag to global-flags.cc,
and declared it in global-flags.h.

Runtime checks test if read_size is greater than or eaqual to
READ_SIZE_MIN_VALUE. If not, the process is terminated.

Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
---
M be/src/common/global-flags.cc
A be/src/common/global-flags.h
M be/src/common/init.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/runtime/disk-io-mgr.cc
M be/src/util/backend-gflag-util.cc
6 files changed, 57 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/8366/3
--
To view, visit http://gerrit.cloudera.org:8080/8366
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da
Gerrit-Change-Number: 8366
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects

2017-10-26 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8368


Change subject: IMPALA-2235: Fix current db when shell auto-reconnects
..

IMPALA-2235: Fix current db when shell auto-reconnects

When precmd tested the connection it didn't validate that if we are
using the previously selected DB. The _validate_database method
is responsible for that, but it only appended the "use " command
to the cmdqueue (command queue of Cmd class). But, at this point we
might already have commands in the command queue that will run before
the "use " command.

Also, the command processed by precmd can entirely skip the cmdqueue,
therefore it is not enough to insert the "use " command to the front
of cmdqueue. We need to issue the USE command with the onecmd() method
to execute it immediately.

I extended the _validate_database method with an "immediately" flag.
If this is true, _validate_database will use the onecmd() method.
Otherwise, it will append the USE command to the cmdqueue to maintain
the previous behaviour.

I added a new automated test suite named test_shell_interactive_reconnect.py
to the "custom cluster" tests. It sets the default database, and after
reconnection it checks if the shell set it again automatically.

One test case checks if the shell set the default db after manually
reconnecting to the impala daemon by issuing the CONNECT command.
The other test case checks if the shell set the default db after
automatic reconnection due to cluster restart.

I needed to start/restart the cluster in these tests. That functionality
was already implemented in class TestBreakpadBase, but I didn't want the new
tests to depend on code from an other test suite, therefore I moved
TestBreakpadBase class to tests/common/cluster_controller.py and renamed
it to ClusterController.

I also needed to backup the impala shell history file because I didn't
want to pollute it by the test cases (just like the way it is done in
tests/shell/test_shell_interactive.py). I created utility functions for
this in tests/shell/util.py and now test_shell_interactive.py and
the newly created test suite are using these utility functions.

Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
---
M shell/impala_shell.py
A tests/common/cluster_controller.py
M tests/custom_cluster/test_breakpad.py
A tests/custom_cluster/test_shell_interactive_reconnect.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
6 files changed, 243 insertions(+), 121 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8368/2
--
To view, visit http://gerrit.cloudera.org:8080/8368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e
Gerrit-Change-Number: 8368
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 10:50:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1393/


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 07:01:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 13:

> Uploaded patch set 13: Patch Set 12 was rebased.

Thanks for the review.
I think it needed a rebase. Could you restart the job?


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 26 Oct 2017 06:11:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7938 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7938/12/bin/bootstrap_system.sh
File bin/bootstrap_system.sh:

http://gerrit.cloudera.org:8080/#/c/7938/12/bin/bootstrap_system.sh@111
PS12, Line 111: if ! { service --status-all | grep -E '^ \[ \+ \]  ssh$'; }
> The apt-get on line 106 is calling the function on line 79. Maybe the probl
Aha, looks like that was the problem. When running sudo apt-get in L79, we 
didn't transfer the environment variables into the sudo scope. I fixed that by 
adding -E. I also tested that it works.



--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 26 Oct 2017 06:10:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2017-10-26 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Michael Brown, Jim Apple, Bikramjeet Vig, Dan Hecht, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7938

to look at the new patch set (#13).

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..

IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

Impala currently kinits by forking off a child process. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.

This patch adds a flag called 'use_kudu_kinit' that defaults to
true. When it's true, it uses the Kudu security library's kinit code
that programatically uses the krb5 library to kinit.
When it's false, we run our current path which kicks off the
kinit-thread and forks off a kinit process periodically to reacquire
tickets based on FLAGS_kerberos_reinit_interval.

Converted existing tests in thrift-server-test to run with and
without kerberos. We now run this BE test with kerberos by using
Kudu's MiniKdc utility. This introduces a new dependency on some
kerberos binaries that are checked through FindKerberosPrograms.cmake.
Note that this is only a test dependency and not a dependency for
the impalad binaries and friends. Compilation will still succeed if
the kerberos binaries for the MiniKdc are not found, however, the
thrift-server-test will fail. We run with and without the
'use_kudu_kinit' flag.

TODO: Since the setting up and tearing down of our security code
isn't idempotent, we can run only any one test in a process with
Kerberos now (IMPALA-6085).

Updated bin/bootstrap_system.sh to install new sasl-gssapi
modules and the kerberos binaries required for the MiniKdc.
Also fixed a bug that didn't transfer the environment into 'sudo'
in bin/bootstrap_system.sh.

Testing: Verified with thrift-server-test and also manually on a
live kerberized cluster.

Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
---
M CMakeLists.txt
M be/src/exec/kudu-util.h
M be/src/kudu/security/CMakeLists.txt
M be/src/kudu/security/test/mini_kdc.cc
M be/src/rpc/CMakeLists.txt
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-server-test.cc
M bin/bootstrap_system.sh
A cmake_modules/FindKerberosPrograms.cmake
10 files changed, 234 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7938/13
--
To view, visit http://gerrit.cloudera.org:8080/7938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Gerrit-Change-Number: 7938
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-26 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..

IMPALA-3548: Prune runtime filters based on query options in the FE

Currently, the FE generates a number of runtime filters and assigns
them to the single node plan without taking the value of
RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options
into account.

The backend then removes filters from exec nodes, based on the
following rules:
1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from
   the exec nodes that are marked as targets not bound by partition
   columns.

2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from
   the exec nodes that are marked as remote targets.

This may cause some confusion to users because they may see runtime
filters in the output of explain that are not applied when the query
is executed.

This change moves the logic of runtime filter pruning to the planner
in the FE. The runtime filter assignment is done on the distributed
plan and the above constraints are enforced there directly.

Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
---
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/service/fe-support.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
12 files changed, 775 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/13
--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall