[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5632/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 05:18:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.

2020-03-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
..


Patch Set 5:

(1 comment)

Patch Set 5 submitted.

What is different from Patch Set 4 is that now we can do single pass insert 
both in HashTable::Insert and HashTable::Iterator::SetTuple.
Right now, this patch is hijacking the FLAGS_enable_quadratic_probing to enable 
robin hood mode rather than the quadratic probing mode.
I plan to create separate flag for robin hood mode in next iteration, so that 
we can do performance comparison between linear, quadratic, and robin hood 
linear probe.

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@14
PS3, Line 14: If a hash table is configured as a robin hood hash table, the new
> I'm still thinking how to do single pass insert. From what I understand, we
This is now addressed in Patch Set 5.



--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:48:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-9434: Implement Robin Hood Hash Table.

2020-03-27 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15511

to look at the new patch set (#5).

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
..

WIP IMPALA-9434: Implement Robin Hood Hash Table.

Robin hood hashing reduces the variances of probe lengths by
continually rebalancing elements. This patch is the first step towards
full robin hood hash table implementation by doing bucket rebalancing
after every insert.

If a hash table is configured as a robin hood hash table, the new
element insertion will be buffered in a temporary bucket. This
temporary bucket will then matched against existing bucket elements,
swapped with a "rich" bucket, and continue doing so until it swap with
an empty bucket. The PSL (probe sequence length) invariant is
maintained in robin hood hash table. This allow us to add
short-circuit in the Probe function to immediately returns when it
finds out that the PSL of currently visited bucket is smaller/richer
than the key that is being looked up, indicating that the key does not
exist in the table. Instead of continue probing until next empty
bucket is found, Probe can immediately the return the index of
recently visited richer bucket to caller along with the not found
flag. In turn, the caller use this returned index to specify in which
index the new element should be inserted to.

Testing:
- Pass core tests

Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/hash-table-benchmark.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exprs/scalar-expr.h
9 files changed, 560 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/15511/5
--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5562/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:26:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5631/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:25:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15552 )

Change subject: IMPALA-9401: primitive include-what-you-use script and mappings
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5561/


--
To view, visit http://gerrit.cloudera.org:8080/15552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367
Gerrit-Change-Number: 15552
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:03:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl@5
PS13, Line 5: # The flags below require unsafe flags to be unlocked.
I realize I accidentally pushed this file. I will remove it in my next 
iteration based on reviews.


http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl@7
PS13, Line 7: # The flags below require unsafe flags to be unlocked.
I realize I accidentally pushed this file. I will remove it in my next 
iteration based on reviews.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 03:54:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#13) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column and key
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns

Follow on work:
- Add min-max runtime filter tests/support
- Add predicate pushdown and partition tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
16 files changed, 776 insertions(+), 642 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/13
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5560/


--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 17
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 02:47:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5630/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 02:21:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5629/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 02:16:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15573 )

Change subject: IMPALA-9561: Change ozone client dependency to 
hadoop-ozone-filesystem-lib-current
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5628/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3
Gerrit-Change-Number: 15573
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:48:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc
File be/src/util/impalad-metrics.cc:

http://gerrit.cloudera.org:8080/#/c/15382/5/be/src/util/impalad-metrics.cc@360
PS5, Line 360: IO_MGR_REMOTE_DATA_CACHE_DROPPED_ENTRIES = 
IO_MGR_METRICS->AddCounter(
Missed this indentation.



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:40:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#6).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track failed inserts, this also adds the following metrics:
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-failed-inserts

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 257 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/6
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456
PS21, Line 456:   mem_tracker_->Consume(deferred_consumption_);
> ahh because deferred_consumption_ can be negative, right?
Exactly, it is deferred in both directions. It gets incorporated into the 
underlying memtracker when the absolute value is greater than the max deferred 
consumption.



--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 21
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:37:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 23: Code-Review+1

Rebased, carrying +1


--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 23
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15382

to look at the new patch set (#5).

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..

IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

This adds two sets of metrics. The first is per-partition metrics
to track the performance of the underlying filesystem for the
data cache. It keeps histograms of read, write, and eviction
latency for each data cache partition along with another metric
recording the path for the partition. These are exposed as the
following metrics:
impala-server.io-mgr.remote-data-cache-partition-$0.path
impala-server.io-mgr.remote-data-cache-partition-$0.read-latency
impala-server.io-mgr.remote-data-cache-partition-$0.write-latency
impala-server.io-mgr.remote-data-cache-partition-$0.eviction-latency

This also adds metrics to keep counts of hits, misses, and entries
in the data cache. Since reducing the latency of IO is an important
feature of the data cache, the absolute count of hits and misses
is as important as the hit bytes and miss bytes. This adds the
following metrics:
impala-server.io-mgr.remote-data-cache-hit-count
impala-server.io-mgr.remote-data-cache-miss-count
impala-server.io-mgr.remote-data-cache-num-entries

To track failed inserts, this also adds the following metrics:
impala-server.io-mgr.remote-data-cache-dropped-entries
impala-server.io-mgr.remote-data-cache-failed-inserts

Testing:
 - Hand testing to verify the per-partition latency histograms
 - Modified custom_cluster/test_data_cache.py to also test
   the counts.

Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
---
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/util/impalad-metrics.cc
M be/src/util/impalad-metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_data_cache.py
7 files changed, 257 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/15382/5
--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
> how about hit and miss rates? if we just emit the counts, I guess something
Yes, I'm thinking that there will be a separate mechanism to consume the 
counters and be able to perform rate analysis. Prometheus seems to have this 
capability:
https://www.innoq.com/en/blog/prometheus-counters/

The histograms for writes, reads, and evictions keep counts of the number of 
events. So, we have that information in the WebUI. I'm less certain about how 
this is accessible through prometheus. We embed the count in what we send to 
prometheus:
https://github.com/apache/impala/blob/master/be/src/util/histogram-metric.cc#L131
It sounds like this is exposed as its own metric by prometheus:
https://prometheus.io/docs/practices/histograms/

I think that should provide counts for writes per partition.

Since it is interesting to know when we don't do the inserts, I added two 
metrics in this upload for the two most interesting cases where we don't do the 
insert:

impala-server.io-mgr.remote-data-cache-dropped-entries - this is the counter 
version of impala-server.io-mgr.remote-data-cache-dropped-bytes

impala-server.io-mgr.remote-data-cache-failed-inserts - this tracks where the 
Cache::Insert() call fails.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257
PS3, Line 257: int32_t index_;
> Maybe note this is used for differentiating metrics. I don't think that it
Good point, these are strictly for naming metrics (and will be used in future 
for the access trace). The index makes no functional difference. Added a 
comment here.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549
PS3, Line 549:   if (!TestInfo::is_test()
> I find the control flow in this whole function kind of confusing. Maybe ins
Good point, reworked the control flow.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876
PS3, Line 876: partition_idx++
> nit: ++partition_idx
Done



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:35:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.h
File be/src/rpc/authentication.h:

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.h@30
PS3, Line 30: #include "util/ldap-util.h"
nit: we probably only need a forward declaration, so long as the constructor 
and destructor are declared in the .cc file.


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.h
File be/src/util/ldap-util.h:

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.h@57
PS3, Line 57: set
unordered_set? I didn't see a reason why ordering matters.


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc
File be/src/util/ldap-util.cc:

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@121
PS3, Line 121: user_filter_ = Split(user_filter, ",");
I guess this wouldn't allow us to specify a user or group with  a comma in it, 
since no escaping, but it seems like the hive syntax is the same: 
https://github.com/apache/hive/blob/037eacea46371015a7f9894c5a9ccfb9708d5c56/common/src/java/org/apache/hive/common/util/HiveStringUtils.java


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@133
PS3, Line 133: // DN patterns and replacing any '%s' with each group name.
It looks like we're replacing only the first %s, is that intended?


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@241
PS3, Line 241: int rc = ldap_search_ext_s(ld, group_dn.c_str(), 
LDAP_SCOPE_SUBTREE, filter.c_str(),
Can you add a brief comment about what this invocation is doing?


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@264
PS3, Line 264: LOG(WARNING) << "Was not able to get DN from search 
result.";
Is there any more context that would be useful to debug this issue? E.g. the 
group_dn?


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@268
PS3, Line 268:   LOG(WARNING) << "Following of references is not 
currently supported, ignoring.";
Same comment


http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/util/ldap-util.cc@281
PS3, Line 281:   vector attributes = Split(rdn, delimiter::Limit(",", 
1));
Do we need to do any unescaping here? It looks like the DN syntax uses \ for 
escaping - 
https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ldap/distinguished-names

FWIW it looks like other pre-existing string manipulation in this file doesn't 
do escaping, so it may be a general limitation, but maybe if there's an ldap 
library function to do the parsing that would be more robust


http://gerrit.cloudera.org:8080/#/c/15570/3/fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
File fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java:

http://gerrit.cloudera.org:8080/#/c/15570/3/fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java@205
PS3, Line 205: // Run with user that passes the group filter but not the 
user filter, should fail.
Also check if it fails both filters?



--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 01:11:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9561: Change ozone client dependency to hadoop-ozone-filesystem-lib-current

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15573


Change subject: IMPALA-9561: Change ozone client dependency to 
hadoop-ozone-filesystem-lib-current
..

IMPALA-9561: Change ozone client dependency to 
hadoop-ozone-filesystem-lib-current

The current dependency on Ozone pulls in a bunch of transitive
dependencies that can pollute the Impala classpath. Ozone publishes a
fat-jar that shades all client dependencies. Impala should use the
fat-jar instead of depending on the hadoop-ozone-filesystem jar.

Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3
---
M fe/pom.xml
1 file changed, 1 insertion(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/15573/1
--
To view, visit http://gerrit.cloudera.org:8080/15573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I313b84919883f4c6838748635d999a2bc12a59d3
Gerrit-Change-Number: 15573
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 17
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Sat, 28 Mar 2020 00:32:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..

IMPALA-9466: impala-shell client retry for hs2-http protocol

Added retries for idempotent rpcs:
OpenSession, PingImpalaHS2Service, GetResultSetMetadata,
CloseImpalaOperation (non dmls), CancelOperation, GetOperationStatus,
GetRuntimeProfile, GetExecSummary, GetLog

Retries were also added to the 'set all' query execution and subsequent
result fetch in the ImpalaHS2Client._open_session()

The retries are only supported for hs2-http protocol and enabled by
default. At most there are 3 retries for a failed rpc. There is a sleep
duration of 'n' seconds after nth retry.

Only failed rpcs due to an error in the http transport are retried and
if an rpc failed because the server returned an error in the rpc
response then such scenarios are not retriable.

Improved error diagnostics by dumping stack trace when ImpalaShell.
_execute_stmt() gets an 'Unknown Exception'.

Testing:
- Added a custom_cluster test which injects fault into the http transport
and checks expected behavior from the various rpcs. Some of these tests
leave the session in an open state and so these tests are not suitable
for the e2e test framework which have metric verifiers expecting related
metrics to be 0 at the end of the test.
- Manually tested real world scenarios with impala-shell client
communicating with an impala coordinator via a fault injecting istio mesh.
- Manually tested dropping connections on an nginx ingress gateway by sending
SIGTERM to all worker processes.

Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Reviewed-on: http://gerrit.cloudera.org:8080/15378
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M shell/impala_client.py
M shell/impala_shell.py
M shell/shell_exceptions.py
A tests/custom_cluster/test_hs2_fault_injection.py
4 files changed, 505 insertions(+), 51 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 18
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5627/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:57:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 3:

> (1 comment)
 >
 > QQ before I review in depths so I understand the context.
 >
 > This change applies to LDAP auth with any endpoint right? I.e. if a
 > group is excluded by the filter, they can't connect to HS2 or the
 > webserver or any endpoint where ldap is enabled.
 >
 > For the webserver auth, does this mean that we'll need an
 > additional flag to control which groups or users can connect to the
 > webserver?

Right so the intention here is that https://gerrit.cloudera.org/#/c/15538/ will 
add flags like --webserver_ldap_group_filter that will allow for doing things 
like restricting webserver authentication to an admin group.


--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:35:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

That is great news if we can remove the ntp requirement for our dev cluster.


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:35:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15552 )

Change subject: IMPALA-9401: primitive include-what-you-use script and mappings
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367
Gerrit-Change-Number: 15552
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:34:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15552 )

Change subject: IMPALA-9401: primitive include-what-you-use script and mappings
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5561/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367
Gerrit-Change-Number: 15552
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:34:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 3:

(1 comment)

QQ before I review in depths so I understand the context.

This change applies to LDAP auth with any endpoint right? I.e. if a group is 
excluded by the filter, they can't connect to HS2 or the webserver or any 
endpoint where ldap is enabled.

For the webserver auth, does this mean that we'll need an additional flag to 
control which groups or users can connect to the webserver?

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/15570/3/be/src/rpc/authentication.cc@97
PS3, Line 97: athentication
typo



--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:33:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15569 )

Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..

IMPALA-9560: Fix TestStatsExtrapolation for release versions

When changing the Impala version from 3.4.0-SNAPSHOT to 3.4.0-RELEASE,
TestStatsExtrapolation::test_stats_extrapolation started failing due
to a difference in the expected cardinality (expected: 17.91K,
actual 17.90K). This is because the Impala version gets embedded into
parquet files, and this causes a slight difference in file size, which
translates into a slight difference in expected cardinality.

This modifies TestStatsExtrapolation::test_stats_extrapolation to
allow any 17.9*K cardinality.

Testing:
 - Tested on master and on branch-3.4.0

Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Reviewed-on: http://gerrit.cloudera.org:8080/15569
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15569 )

Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 23:27:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15570

to look at the new patch set (#3).

Change subject: IMPALA-2563: Support LDAP search bind operations
..

IMPALA-2563: Support LDAP search bind operations

This patch adds a number of new options for controlling LDAP
by restricting authentication to particular users and/or members of
particular groups:
--ldap_group_filter: comma separated list of authorized groups
--ldap_user_filter: comma separated list of authorized users

There are also options to control how LDAP is searched when applying
these filters:
--ldap_group_dn_pattern
--ldap_group_membership_key
--ldap_group_membership_class

These options were modelled on equivalent options in Hive, see:
https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2
https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap

This patch also refactors LDAP related functionality into a utility
class, both to make authentication.cc more manageable and to
facilitate follow up work that will add LDAP authentication options
for the webserver.

Testing:
- Added a FE custom cluster test that sets --ldap_group_filter and
  --ldap_user_filter and verifies expected behavior.

Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
---
M be/src/common/global-flags.cc
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
M be/src/util/CMakeLists.txt
A be/src/util/ldap-util.cc
A be/src/util/ldap-util.h
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M fe/src/test/resources/users.ldif
8 files changed, 446 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/3
--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5626/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 22:59:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5625/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 22:39:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5560/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 17
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 22:18:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 16: Code-Review+2

(1 comment)

carry

http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h@94
PS12, Line 94:
> nit: extra space
Done



--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 22:17:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Tim Armstrong (Code Review)
Hello Sahil Takiar, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15454

to look at the new patch set (#16).

Change subject: IMPALA-3766: optionally compress spilled data
..

IMPALA-3766: optionally compress spilled data

Enabled via --disk_spill_compression_codec, which uses
the same syntax as the compression_codec query option.
Recommended codecs are LZ4 and ZSTD. ZSTD supports
specifying a compression level.

The compression is done in TmpFileMgr using a temporary
buffer. Allocation of disk space is reworked slightly
so that the allocation can happen after compression.

The default power-of-two disk block sizes would lead
to a lot of internal fragmentation, so a new strategy
for free space management, similar to that used in
the data cache, can be used with
--disk_spill_punch_holes=true. TmpFileMgr will allocate
a range of the actual compressed size and punch holes
in the file for each range that is no longer needed.

UncompressedWriteIoBytes is added to the buffer pool
profiles, so that you can see what degree of compression
is achieved. Typically I saw ratios of 2-3x for LZ4 and
ZSTD (with LZ4 toward the lower end and ZSTD toward
the higher end).

Limitations:
The management of the compression buffer memory could
be improved. Ideally it would be integrated with the
buffer pool and use the buffer pool allocator instead
of being done "on the side". We would probably want to
do this before making this the default, for resource
management and performance reasons (doing a malloc()
directly does not use the caching supported by the
buffer pool).

Testing:
* Run buffer pool spilling tests with different combinations of
  the new options.
* Extend existing TmpFileMgr tests for file space allocation to
  run with hole punching enabled.
* Switch a couple of spilling tests to use the new option.
* Add a metrics test to check for scratch leaks.
* Enable the new options by default for end-to-end dockerized
  tests to get additional coverage.
* Add a unit test where allocating compression memory fails,
  both on the read and write path.
* Ran a single-node stress test on TPC-DS SF 1 and TPC-H SF 10
  The peak compression buffer usage was ~40MB.

Perf:
I ran this spilling query using an SSD as the scratch disk:

  set mem_limit=200m;
  select count(distinct l_partkey) from
  tpch30_parquet.lineitem;

The time taken for the second run of each query was:
No compression: 19.59s
LZ4: 18.56s
ZSTD: 20.59s

Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
---
M be/src/runtime/bufferpool/buffer-pool-counters.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/test-env.cc
M be/src/runtime/test-env.h
M be/src/runtime/tmp-file-mgr-internal.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/query-options.cc
M be/src/util/parse-util.cc
M be/src/util/parse-util.h
M bin/jenkins/dockerized-impala-run-tests.sh
M tests/custom_cluster/test_scratch_disk.py
M tests/verifiers/metric_verifier.py
15 files changed, 849 insertions(+), 246 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/15454/16
--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 17
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 22:18:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15570

to look at the new patch set (#2).

Change subject: IMPALA-2563: Support LDAP search bind operations
..

IMPALA-2563: Support LDAP search bind operations

This patch adds a number of new options for controlling LDAP
by restricting authentication to particular users and/or members of
particular groups:
--ldap_group_filter: comma separated list of authorized groups
--ldap_user_filter: comma separated list of authorized users

There are also options to control how LDAP is searched when applying
these filters:
--ldap_group_dn_pattern
--ldap_group_membership_key
--ldap_group_membership_class

These options were modelled on equivalent options in Hive, see:
https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2
https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap

This patch also refactors LDAP related functionality into a utility
class, both to make authentication.cc more manageable and to
facilitate follow up work that will add LDAP authentication options
for the webserver.

Testing:
- Added a FE custom cluster test that sets --ldap_group_filter and
  --ldap_user_filter and verifies expected behavior.

Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
---
M be/src/common/global-flags.cc
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
M be/src/util/CMakeLists.txt
A be/src/util/ldap-util.cc
A be/src/util/ldap-util.h
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M fe/src/test/resources/users.ldif
8 files changed, 446 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/2
--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9401: primitive include-what-you-use script and mappings

2020-03-27 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15552 )

Change subject: IMPALA-9401: primitive include-what-you-use script and mappings
..


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15552/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15552/1//COMMIT_MSG@12
PS1, Line 12: built
> Done
Done


http://gerrit.cloudera.org:8080/#/c/15552/1/bin/iwyu/iwyu.sh
File bin/iwyu/iwyu.sh:

http://gerrit.cloudera.org:8080/#/c/15552/1/bin/iwyu/iwyu.sh@31
PS1, Line 31:   echo ""
> Done
Done



--
To view, visit http://gerrit.cloudera.org:8080/15552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf5f9ba865313afb0c581e6482514ef7f1c65367
Gerrit-Change-Number: 15552
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 21:48:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:33:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 15: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/bufferpool/buffer-pool.cc
File be/src/runtime/bufferpool/buffer-pool.cc:

http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/bufferpool/buffer-pool.cc@736
PS11, Line 736: [this, page](
> It wasn't an intentional change, but I think it is an edge case that doesn'
wfm



--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 21:34:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 17:

One of the custom cluster tests failed because an impalad didn't startup in 
time. Seems unrelated to this patch. Re-running the pre-commit job to see if we 
can get a clean run to merge this. If it fails again, I'll just merge it myself.


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 17
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:27:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5624/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 22
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 21:14:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Joe McDonnell (Code Review)
Hello Thomas Tauber-Marshall, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15306

to look at the new patch set (#22).

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..

IMPALA-8690: Add LIRS cache eviction algorithm

One concern for the data cache is that the LRU eviction
algorithm is suceptible to being flushed by large
scans of low priority data. This implements the LIRS algorithm
described in "LIRS: An Efficient Low Inter-reference Recency
Set Replacement Policy to Improve Buffer Cache Performance"
by Song Jiang / Xiaodon Xhang 2002. LIRS is a scan-resistent
eviction algorithm with low performance penalty to LRU.

This introduces the startup flag data_cache_eviction_policy to
control which eviction policy to use. The only two options are
LRU and LIRS, with the default continuing to be LRU.

To accomodate the new algorithm and associated tests, some
code moved around:
1. The RLCacheShard implementation moved from util/cache/cache.cc
   to util/cache/rl-cache.cc.
2. The backend cache tests were split into multiple files.
   util/cache/cache-test.h contains shared cache testing code.
   util/cache/cache-test.cc contains generic tests that should
   work for any algorithm.
   util/cache/rl-cache-test.cc are RLCacheShard specific tests
   util/cache/lirs-cache-test.cc are LIRS specific tests
3. To make it easy for clients of the cache code to customize
   the cache eviction algorithm, the public interface changed
   from using a template to taking the policy as an argument.
4. Cache::MemoryType is removed.
5. Cache adds an Init() method to verify the validity of
   startup flags

Testing:
 - Added LIRS specific backend cache tests (lirs-cache-test)
 - Ran TPC-DS with a very small cache and concurrency to test
   corner cases with the LIRS eviction policy
 - Parameterized data-cache-test to run for both LRU and LIRS
 - Added LIRS equivalents for tests in custom_cluster/test_data_cache.py
 - Ran cache-bench with LRU and LIRS. The results are:
   Test case   | Algorithm | Lookups / sec | Hit rate
   ZIPFIAN ratio=1.00x | LRU   | 11.31M| 99.9%
   ZIPFIAN ratio=1.00x | LIRS  | 10.09M| 99.8%
   ZIPFIAN ratio=3.00x | LRU   | 11.36M| 95.9%
   ZIPFIAN ratio=3.00x | LIRS  |  9.27M| 96.4%
   UNIFORM ratio=1.00x | LRU   |  7.46M| 99.8%
   UNIFORM ratio=1.00x | LIRS  |  6.93M| 99.8%
   UNIFORM ratio=3.00x | LRU   |  5.63M| 33.3%
   UNIFORM ratio=3.00x | LIRS  |  3.24M| 33.3%
   The takeaway is that LIRS is a bit slower on lookups and
   quite a bit slower on inserts. However, they both are still
   doing millions of operations per second, so it should not
   be a bottleneck for the data cache.

Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/util/cache/CMakeLists.txt
M be/src/util/cache/cache-bench.cc
M be/src/util/cache/cache-internal.h
M be/src/util/cache/cache-test.cc
A be/src/util/cache/cache-test.h
M be/src/util/cache/cache.cc
M be/src/util/cache/cache.h
A be/src/util/cache/lirs-cache-test.cc
A be/src/util/cache/lirs-cache.cc
A be/src/util/cache/rl-cache-test.cc
A be/src/util/cache/rl-cache.cc
M bin/rat_exclude_files.txt
M tests/custom_cluster/test_data_cache.py
16 files changed, 2,678 insertions(+), 844 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/15306/22
--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 22
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-3766: optionally compress spilled data

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
..


Patch Set 15: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/15454/12/be/src/runtime/tmp-file-mgr-internal.h@94
PS12, Line 94:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@835
PS11, Line 835: LOG(WARNING) << "Failed to compress, couldn't create 
compressor: "
> That's a good point. I just switched to calling .Release() to be symmetrica
It's probably fine not to test this since it is an edge case unlikely to happen.



--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:23:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5559/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 17
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:26:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 17
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:26:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 21: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456
PS21, Line 456:   mem_tracker_->Consume(deferred_consumption_);
> I will add a comment here.
ahh because deferred_consumption_ can be negative, right?



--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 21
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:02:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 16: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5556/


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 16
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:07:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(1 comment)

Mostly a high level comment about what metrics to include, code itself looks 
good.

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15382/3//COMMIT_MSG@25
PS3, Line 25: impala-server.io-mgr.remote-data-cache-hit-count
: impala-server.io-mgr.remote-data-cache-miss-count
: impala-server.io-mgr.remote-data-cache-num-entries
how about hit and miss rates? if we just emit the counts, I guess something 
like Grafana would be able to derive the hit / miss rates, right? is that the 
plan?

not sure what the LLAP-Grafana dashboard looks like, but might be worth taking 
a look at what they have?

for the miss-count, should there be an additional metric that helps 
differentiate between cache misses and cache writes? e.g. for LIRS you can have 
a cache miss, but the data won't necessarily be written to the cache. so maybe 
we can add a 'remote-data-cache-write-count' to track how many times a miss 
actually ends up writing data to the cache?

I stole some ideas from the Guava cache metrics FYI: 
https://guava.dev/releases/22.0/api/docs/com/google/common/cache/CacheStats.html



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:59:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15567 )

Change subject: IMPALA-9513: Fix 
TestKuduOperations.test_column_storage_attributes
..

IMPALA-9513: Fix TestKuduOperations.test_column_storage_attributes

When introducing Kudu date support, test_column_storage_attributes
was modified to add the date datatype. The test expects the date
to be represented in python as datetime.date. Instead, the date
is a string in python, so the test consistently fails.

This changes the test so that it expects a python string, and the
test now passes.

Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Reviewed-on: http://gerrit.cloudera.org:8080/15567
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/query_test/test_kudu.py
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Gerrit-Change-Number: 15567
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Volodymyr Verovkin 


[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15567 )

Change subject: IMPALA-9513: Fix 
TestKuduOperations.test_column_storage_attributes
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Gerrit-Change-Number: 15567
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Fri, 27 Mar 2020 20:00:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9472,IMPALA-9473: Add per-partition metrics for data cache

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15382 )

Change subject: IMPALA-9472,IMPALA-9473: Add per-partition metrics for data 
cache
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.h@257
PS3, Line 257: int32_t index_;
Maybe note this is used for differentiating metrics. I don't think that it has 
any other relationship to a meaningful value or property of the cache?


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@549
PS3, Line 549:   if (!TestInfo::is_test()
I find the control flow in this whole function kind of confusing. Maybe instead 
do something like
if (is_test() && FindMetricForTesting(PATH_METRIC) != nullptr) {
  // Metrics already inited.
  return;
}
and then below you can just have the metrics created like normal without any 
more is_test or == nullptr checks.


http://gerrit.cloudera.org:8080/#/c/15382/3/be/src/runtime/io/data-cache.cc@876
PS3, Line 876: partition_idx++
nit: ++partition_idx



--
To view, visit http://gerrit.cloudera.org:8080/15382
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a57d75ff11f00ebc85b85bcaf104fb8108c478
Gerrit-Change-Number: 15382
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:52:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 21:

(5 comments)

Working on a new upload (which will add comments). Going ahead and posting 
replies.

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@131
PS21, Line 131: entires
> nit: typo
Will fix


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456
PS21, Line 456:   mem_tracker_->Consume(deferred_consumption_);
> shouldn't all the memory be release when the shard is destroyed?
I will add a comment here.

The eviction/free code in CleanupThreadState() is calling UpdateMemTracker() as 
it goes, which decrements the underlying memtracker each time it exceeds 
max_deferred_consumption. The most that can be left over at this point is the 
deferred_consumption.


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@861
PS21, Line 861:   e->set_resident();
> are we suppose to set that data as resident even if 'success' is false?
The failure condition is like being successful and then evicting it 
immediately. So, we do all the same preparation up front.

The contract for a caller is that the data needs to resident before we call 
this. The caller ran Allocate() and got an UNINITIALIZED handle with the key 
filled in. Caller put the data somewhere (like the data cache file) and then 
filled in the value on the handle. Now it is calling Insert(), and the data is 
already resident.

If the Insert fails, it calls the eviction callback, which will make this not 
resident (and also decrement the memtracker below).

I will add a comment here to clarify the success/failure case and how it works.


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@871
PS21, Line 871: // 3. There is a non-tombstone entry (rare)
> is this possible? i guess this could happen if two Inserts of the same data
Yes, this is possible in a race condition on insert.

It is also possible for the data cache when we are inserting a larger version 
of an existing entry (i.e. file=X offset=0 length=5 is getting replaced by 
file=X offset=0 length=10).


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@876
PS21, Line 876: UpdateMemTracker(e->charge());
> not sure I actually understand what this is doing, so perhaps a misplaced c
See other comment on residency. This is the same between success/failure, 
because eviction will decrement this.



--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 21
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:39:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

(1 comment)

> I can make those other clean up changes. Mind if I do it in a
 > follow on patch?

Sure. Doesn't seem like a lot of work so putting it in this patch would be 
nice, but up to you.

http://gerrit.cloudera.org:8080/#/c/15568/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15568/2//COMMIT_MSG@7
PS2, Line 7: [test]
Please file an IMPALA jira for this. If you split it up into two patches, you 
can use the same one, just mark the second review as (part 2)



--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:36:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 19:30:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15569 )

Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5558/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:58:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15569 )

Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:53:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15570 )

Change subject: IMPALA-2563: Support LDAP search bind operations
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/5623/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:34:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

I can make those other clean up changes. Mind if I do it in a follow on patch?


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:23:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

yes, we should be able to remove all ntp related steps.


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:22:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

Does this change mean that we would no longer need to call ntp-wait before 
starting the Kudu daemons? see Impala/testdata/cluster/admin

That's been an occasional source of difficulty/flakiness in different testing 
environments for us, so it would be nice to be able to eliminate it.


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:20:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15569 )

Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5622/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 27 Mar 2020 18:19:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2563: Support LDAP search bind operations

2020-03-27 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15570


Change subject: IMPALA-2563: Support LDAP search bind operations
..

IMPALA-2563: Support LDAP search bind operations

This patch adds a number of new options for controlling LDAP
by restricting authentication to particular users and/or members of
particular groups:
--ldap_group_filter: comma separated list of authorized groups
--ldap_user_filter: comma separated list of authorized users

There are also options to control how LDAP is searched when applying
these filters:
--ldap_group_dn_pattern
--ldap_group_membership_key
--ldap_group_membership_class

These options were modelled on equivalent options in Hive, see:
https://cwiki.apache.org/confluence/display/Hive/User+and+Group+Filter+Support+with+LDAP+Atn+Provider+in+HiveServer2
https://github.com/apache/hive/tree/master/service/src/java/org/apache/hive/service/auth/ldap

This patch also refactors LDAP related functionality into a utility
class, both to make authentication.cc more manageable and to
facilitate follow up work that will add LDAP authentication options
for the webserver.

Testing:
- Added a FE custom cluster test that sets --ldap_group_filter and
  --ldap_user_filter and verifies expected behavior.

Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
---
M be/src/common/global-flags.cc
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
M be/src/util/CMakeLists.txt
A be/src/util/ldap-util.cc
A be/src/util/ldap-util.h
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M fe/src/test/resources/users.ldif
8 files changed, 446 insertions(+), 166 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/15570/1
--
To view, visit http://gerrit.cloudera.org:8080/15570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7502a96e9a3c16faa67c03ffac54df2bdebbca8c
Gerrit-Change-Number: 15570
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4080: Codegen once per fragment

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15408 )

Change subject: IMPALA-4080: Codegen once per fragment
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15408/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15408/8//COMMIT_MSG@26
PS8, Line 26: TODO:
This can be updated right?



--
To view, visit http://gerrit.cloudera.org:8080/15408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452
Gerrit-Change-Number: 15408
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:46:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4080: Codegen once per fragment

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15408 )

Change subject: IMPALA-4080: Codegen once per fragment
..


Patch Set 8:

(5 comments)

Looking good! I have only minor comments. It looks like there are quite a few 
TODOs for comments, so I'm assuming you're going to get back to those.

http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h
File be/src/runtime/fragment-state.h:

http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h@42
PS8, Line 42:   // TODO: add desc
Missed this TODO


http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.h@168
PS8, Line 168:   /// TODO: add description
Same here


http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.cc
File be/src/runtime/fragment-state.cc:

http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/fragment-state.cc@84
PS8, Line 84:   query_state_->ErrorDuringFragmentCodegen(codegen_status_);
It would be good to do this outside of the critical section so that we don't 
need to worry about lock ordering between codegen_lock_ and 
QueryState::status_lock_


http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/15408/8/be/src/runtime/query-state.cc@548
PS8, Line 548:   {
nit: extra braces for lock scope not needed


http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h
File be/src/udf/udf-internal.h:

http://gerrit.cloudera.org:8080/#/c/15408/7/be/src/udf/udf-internal.h@174
PS7, Line 174:   static int GetConstFnAttr(bool uses_decimal_v2,
> I decided to just pass this bool here (just renamed it to uses_decimal_v2)
ok sounds fine



--
To view, visit http://gerrit.cloudera.org:8080/15408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aef8bc621f96caafe9a1c378617a2987e4ad452
Gerrit-Change-Number: 15408
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:45:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9560: Fix TestStatsExtrapolation for release versions

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15569


Change subject: IMPALA-9560: Fix TestStatsExtrapolation for release versions
..

IMPALA-9560: Fix TestStatsExtrapolation for release versions

When changing the Impala version from 3.4.0-SNAPSHOT to 3.4.0-RELEASE,
TestStatsExtrapolation::test_stats_extrapolation started failing due
to a difference in the expected cardinality (expected: 17.91K,
actual 17.90K). This is because the Impala version gets embedded into
parquet files, and this causes a slight difference in file size, which
translates into a slight difference in expected cardinality.

This modifies TestStatsExtrapolation::test_stats_extrapolation to
allow any 17.9*K cardinality.

Testing:
 - Tested on master and on branch-3.4.0

Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
---
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/15569/1
--
To view, visit http://gerrit.cloudera.org:8080/15569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iebe538936f23c095ef58c808e425cfb7b31edd94
Gerrit-Change-Number: 15569
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-8690: Add LIRS cache eviction algorithm

2020-03-27 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15306 )

Change subject: IMPALA-8690: Add LIRS cache eviction algorithm
..


Patch Set 21:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@131
PS21, Line 131: entires
nit: typo


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@456
PS21, Line 456:   mem_tracker_->Consume(deferred_consumption_);
shouldn't all the memory be release when the shard is destroyed?


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@861
PS21, Line 861:   e->set_resident();
are we suppose to set that data as resident even if 'success' is false?


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@871
PS21, Line 871: // 3. There is a non-tombstone entry (rare)
is this possible? i guess this could happen if two Inserts of the same data 
happen concurrently?


http://gerrit.cloudera.org:8080/#/c/15306/21/be/src/util/cache/lirs-cache.cc@876
PS21, Line 876: UpdateMemTracker(e->charge());
not sure I actually understand what this is doing, so perhaps a misplaced 
comment, but what happens if 'success' is false?



--
To view, visit http://gerrit.cloudera.org:8080/15306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I670fa4b2b7c93998130dc4e8b2546bb93e9a84f8
Gerrit-Change-Number: 15306
Gerrit-PatchSet: 21
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:31:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:15:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..

IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

With HIVE-22589 Hive3 switched back to using Julian Calendar for
historical dates by default which caused an Impala test failure
around Avro DATE values.

Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Reviewed-on: http://gerrit.cloudera.org:8080/15564
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M testdata/workloads/functional-query/queries/QueryTest/avro_date.test
M tests/query_test/test_date_queries.py
2 files changed, 33 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5621/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:08:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/orc-metadata-utils.cc@87
PS8, Line 87:   DCHECK(ValidateFullAcidFileSchema().ok()); // Should have 
already been validated.
It would be nice if this printed the validation error. Maybe we need a 
DCHECK_OK macro, like EXPECT_OK?


http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test
File testdata/workloads/functional-query/queries/QueryTest/describe-path.test:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143
PS5, Line 143: describe functional_orc_def.alltypes
> Yeah, I wasn't sure about this, but it was handy during development.
That's fine, I think there's some similar behaviour with nested types if you 
describe nested fields that are not explicitly listed, e.g. you can refer to a 
nested item with the .item member or implicitly by omitting it.



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 17:05:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5620/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:57:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8005: Randomize partitioning exchanges.

2020-03-27 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15497 )

Change subject: IMPALA-8005: Randomize partitioning exchanges.
..

IMPALA-8005: Randomize partitioning exchanges.

Currently, we use the same hash seed for partitioning exchanges at
the sender. For a table with skew in distribution in the shuffling
keys, multiple queries using the same shuffling keys for exchanges
will end up hashing to the same destination fragments running on
a particular host and potentially overloading that host.

This patch seeds the hash with query id. This will ensure that
the partitioning exchanges do not always hash to the
same destination with same shuffling keys.

Testing:
Added a test to data-stream-test to verify the data values at
destination are different for different queries.

Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468
Reviewed-on: http://gerrit.cloudera.org:8080/15497
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
3 files changed, 93 insertions(+), 10 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15497
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468
Gerrit-Change-Number: 15497
Gerrit-PatchSet: 6
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(7 comments)

PS8 is a rebase.

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/6/be/src/exec/orc-metadata-utils.h@51
PS6, Line 51:  private:
> These could be const I think, since they're only set in the constructor
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231:   return Status(Substitute(
> I think it would be good to validate that the schema is as expected, just s
Turned it into a validator function that returns a Status, calling it from 
HdfsOrcScanner::Open().


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@323
PS6, Line 323: table.getMetaStoreTable().getParameters());
> nit: indentation is a bit off
Done


http://gerrit.cloudera.org:8080/#/c/15395/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@329
PS6, Line 329: 
Preconditions.checkState(col.getName().equals("row__id"));
> nit: multi-line if should use parentheses. Maybe this would be better to re
Done, also converted the 'equals("row__id")' check to a Precondition.


http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test
File testdata/workloads/functional-query/queries/QueryTest/describe-path.test:

http://gerrit.cloudera.org:8080/#/c/15395/5/testdata/workloads/functional-query/queries/QueryTest/describe-path.test@143
PS5, Line 143: describe functional_orc_def.alltypes
> I'm not sure if making this visible is a good idea. What does hive do?
Yeah, I wasn't sure about this, but it was handy during development.

Hive doesn't reveal the hidden fields in DESCRIBE, so I switched to hiding them 
too.

However, when explicitly asked for it, currently I still allow describe on the 
row__id field, i.e.:

  DESCRIBE .row__id;
  +-++-+
  | name| type   | comment |
  +-++-+
  | operation   | int| |
  | originaltransaction | bigint | |
  | bucket  | int| |
  | rowid   | bigint | |
  | currenttransaction  | bigint | |
  +-++-+


I think it's not wrong since we also allow the user to explicitly query the 
fields of the row__id column.

Hive only allows DESCRIBE on tables.


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@197
PS5, Line 197:
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/15395/5/tests/query_test/test_scanners_fuzz.py@283
PS5, Line 283:
> flake8: E261 at least two spaces before inline comment
Done



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:29:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#8).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test
M test

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/testdata/bin/generate-schema-statements.py@321
PS8, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@306
PS8, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:27:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Zoltan Borok-Nagy (Code Review)
Hello Quanlong Huang, Norbert Luksa, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15395

to look at the new patch set (#7).

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..

IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

Full ACID row format looks like this:

{
  "operation": 0,
  "originalTransaction": 1,
  "bucket": 536870912,
  "rowId": 0,
  "currentTransaction": 1,
  "row": {"i": 1}
}

User columns are nested under "row". In the frontend we need to create
slot descriptors that correspond to the file schema. In the catalog we
could mimic the file schema but that would introduce several
complexities and corner cases in column resolution. Also in query
results the heading of the above user column would be "row.i". Star
expansion should also be modified, etc.

Because of that in the Catalog I create the exact opposite of the above
schema:

{
  "row__id":
  {
"operation": 0,
"originalTransaction": 1,
"bucket": 536870912,
"rowId": 0,
"currentTransaction": 1
  }
  "i": 1
}

This way very little modification is needed in the frontend. And the
hidden columns can be easily retrieved via 'SELECT row__id.*' when we
need those for debugging/testing.

We only need to change Path.getAbsolutePath() to return a schema path
that corresponds to the file schema. Also in the backend we need some
extra juggling in OrcSchemaResolver::ResolveColumn() to retrieve the
table schema path from the file schema path.

Testing:
I changed data loading to load ORC files in full ACID format by default.
With this change we should be able to scan full ACID tables that are
not minor-compacted, don't have deleted rows, and don't have original
files.

Newly added Tests:
 * specific queries about hidden columns (full-acid-rowid.test)
 * SHOW CREATE TABLE (show-create-table-full-acid.test)
 * DESCRIBE [FORMATTED] TABLE (describe-path.test)
 * INSERT should be forbidden (acid-negative.test)
 * added tests for column masking (
   ranger_column_masking_complex_types.test)

Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
M testdata/workloads/functional-query/queries/QueryTest/acid.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file-orc.test
M testdata/workloads/functional-query/queries/QueryTest/describe-path.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
A 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-full-acid.test
M test

[Impala-ASF-CR] IMPALA-9042: Milestone 1: properly scan files that has full ACID schema

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/7/testdata/bin/generate-schema-statements.py@321
PS7, Line 321: '
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/15395/7/tests/query_test/test_scanners_fuzz.py@306
PS7, Line 306: n
flake8: E129 visually indented line with same indent as next logical line



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Mar 2020 16:15:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15567 )

Change subject: IMPALA-9513: Fix 
TestKuduOperations.test_column_storage_attributes
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Gerrit-Change-Number: 15567
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Fri, 27 Mar 2020 15:29:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15567 )

Change subject: IMPALA-9513: Fix 
TestKuduOperations.test_column_storage_attributes
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5557/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Gerrit-Change-Number: 15567
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Fri, 27 Mar 2020 15:29:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 16: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 16
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 15:19:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9466: impala-shell client retry for hs2-http protocol

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15378 )

Change subject: IMPALA-9466: impala-shell client retry for hs2-http protocol
..


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5556/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0da9e9e8d34a340eaf763397cc095ff6260d65d5
Gerrit-Change-Number: 15378
Gerrit-PatchSet: 16
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 15:19:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun// 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 14:56:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5619/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 14:55:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15568 )

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Mar 2020 14:43:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Grant Henke (Code Review)
Hello Thomas Tauber-Marshall, Alexey Serbin, Attila Bukor, Tim Armstrong, Csaba 
Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15568

to look at the new patch set (#2).

Change subject: [test] Use `system_unsync` time for Kudu test clusters
..

[test] Use `system_unsync` time for Kudu test clusters

Recently Kudu made enhancements to time source configuration and
adjusted the time source for it’s local clusters/tests to `system_unsync`.

This patch mirrors that behavior in Impala test clusters given there is no
need to require NTP-synchronized clock for a test where all the
participating Kudu masters and tablet servers are run at the same node
using the same local wallclock.

See the Kudu commit here for details:
https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be

Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
---
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
2 files changed, 16 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/15568/2
--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] [test] Use `system unsync` time for Kudu test clusters

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15568


Change subject: [test] Use `system_unsync` time for Kudu test clusters
..

[test] Use `system_unsync` time for Kudu test clusters

Recently Kudu made enhancements to time source configuration and
adjusted the time source for it’s local clusters/tests to `system_unsync`.

This patch mirrors that behavior in Impala test clusters given there is no need 
to require NTP-synchronized clock for a test where all the
participating Kudu masters and tablet servers are run at the same node
using the same local wallclock.

See the Kudu commit here for details:
https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be

Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
---
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
2 files changed, 16 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/15568/1
--
To view, visit http://gerrit.cloudera.org:8080/15568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Gerrit-Change-Number: 15568
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5554/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 12:41:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 12:41:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 1:

> >. the test is skipped for ORC (not sure if this is on purpose or
 > by accident).
 > My guess is that updating this test was forgotten in the quite
 > recent https://gerrit.cloudera.org/#/c/14982/
 >
 > I think that in the ideal case we should test both: Julian to test
 > that invalid dates are handled properly (this probably has to be
 > file format specific, as error messages are different) and
 > Gregorian to have a more extended suite of tests that can run on
 > more file formats.
 >
 > The change itself looks good to me, but I am worried about the back
 > and forth changes in Hive.

Thanks for the review. Let's merge this in now to unblock the core test suite.

I agree that DATE testing across different fileformats and Hive versions is 
pretty messy now and we don't cover all the different scenarios. There's a lot 
of room for improvement, but we should address that in a separate patch-set.


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 12:39:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE SENTRY to true

2020-03-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15214 )

Change subject: IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY 
to true
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15214
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9690a926953a8d3c3872277680b4be0551546c68
Gerrit-Change-Number: 15214
Gerrit-PatchSet: 14
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 12:22:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE SENTRY to true

2020-03-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15214 )

Change subject: IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY 
to true
..

IMPALA-8870: Bump up Guava to 28.1-jre and set DISABLE_SENTRY to true

This patch bumps up the version of Guava libraries from 14.0.1 to
28.1-jre. Due to some changes in Guava's API's, we modify the call
sites accordingly.

Moreover, in order to instruct the Java classes under the directory of
$IMPALA_HOME/common/yarn-extras to use the new Guava libraries, we
explicitly added a dependency in the corresponding pom.xml file.

On the other hand, we set DISABLE_SENTRY to true regardless of
$USE_CDP_HIVE since Sentry's Guava version has not been bumped up yet
and thus run-sentry-service.sh cannot be successfully executed. Recall
that by setting DISABLE_SENTRY to true we also disable every
Sentry-related test, which is fine from now on since Impala 3.4 was
recently branched. The plan is to drop support for Sentry in the Impala
4 line.

Change-Id: I9690a926953a8d3c3872277680b4be0551546c68
Reviewed-on: http://gerrit.cloudera.org:8080/15214
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
---
M bin/impala-config.sh
M common/yarn-extras/pom.xml
M 
common/yarn-extras/src/main/java/org/apache/impala/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/AggregateInfoBase.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticInfo.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/DataSource.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
M impala-parent/pom.xml
48 files changed, 103 insertions(+), 94 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Csaba Ringhofer: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/15214
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9690a926953a8d3c3872277680b4be0551546c68
Gerrit-Change-Number: 15214
Gerrit-PatchSet: 15
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: V

[Impala-ASF-CR] IMPALA-9513: Fix TestKuduOperations.test column storage attributes

2020-03-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15567 )

Change subject: IMPALA-9513: Fix 
TestKuduOperations.test_column_storage_attributes
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic198b72041fbe8fe7376c45356e484b304c6f16c
Gerrit-Change-Number: 15567
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Volodymyr Verovkin 
Gerrit-Comment-Date: Fri, 27 Mar 2020 11:33:49 +
Gerrit-HasComments: No