[Impala-ASF-CR] IMPALA-6948,6962: add end-to-end tests

2018-05-02 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10291


Change subject: IMPALA-6948,6962: add end-to-end tests
..

IMPALA-6948,6962: add end-to-end tests

Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.

For IMPALA-6962, catalogd process restart for tests
is fixed.

Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
---
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_metadata_replicas.py
3 files changed, 172 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/10291/1
--
To view, visit http://gerrit.cloudera.org:8080/10291
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Gerrit-Change-Number: 10291
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart

2018-05-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10289


Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog 
restart
..

IMPALA-6948: Delete catalog update topic entries upon catalog restart

This commit fixes an issue where the statestore may end up with stale
entries in the catalog update topic that do not correspond to the
catalog objects stored in the catalog. This may occur if the catalog
server restarts and some catalog object (e.g. table) that was known to
the catalog before the restart no longer exists in the Hive Metastore
after the restart.

Fix:
The first update for the catalog update topic that is sent by the catalog
instructs the statestore to clear any entries it may have on this topic
before applying the first update. In that way, we guarantee that the
statestore entries are consistent with the catalog objects stored in the
catalog server. Any coordinator that detects the catalog restart will
receive from the statestore a full topic update that reflects the state
of the catalog server.

Testing:
Added statestore test.

Change-Id: I907509bf92da631ece5efd23c275a613ead00e91

Tmp

Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
---
M be/src/catalog/catalog-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M tests/statestore/test_statestore.py
6 files changed, 84 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/10289/1
--
To view, visit http://gerrit.cloudera.org:8080/10289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6
Gerrit-Change-Number: 10289
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell

2018-05-02 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9195 )

Change subject: IMPALA-6337: Fix infinite loop in Impala shell
..


Patch Set 11:

> Patch Set 11:
>
> > Patch Set 11:
> >
> > > Do you know if this bug exists in 0.1.19? We've actually switched to > 
> > > that version elsewhere in the Impala python infra.
> > > https://github.com/apache/impala/blob/master/infra/python/deps/requirements.txt#L65
> >
> > Yes, the bug still exists in 0.1.19. However, depending how we use it, 
> > there may not be a real issue in Python infra. The infinite loop in Impala 
> > shell is a specific case because of the way we use sqlparse.split.
>
> Here's my concern. Work has already started on turning the impala-shell into 
> an actual python package for distributing through PyPI. See:
>
> https://gerrit.cloudera.org/c/9771/
> https://issues.apache.org/jira/browse/IMPALA-1071
> https://issues.apache.org/jira/browse/IMPALA-6808
> 
> With that change, we probably won't be relying on an internally-bundled 
> version of sqlparse anymore, especially if we're uploading impala-shell to 
> PyPI.

Since we've hit several issues with sqlparse and upgrading may not be an easy 
option, do you think it's better to bundle our own sqlparse even with 
distributable impala-shell similar to how we deal with native-toolchain?


--
To view, visit http://gerrit.cloudera.org:8080/9195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Gerrit-Change-Number: 9195
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 03 May 2018 01:27:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5706: Parallelise read I/O in sorter

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9943 )

Change subject: IMPALA-5706: Parallelise read I/O in sorter
..


Patch Set 7:

(16 comments)

Looking good

http://gerrit.cloudera.org:8080/#/c/9943/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9943/7//COMMIT_MSG@39
PS7, Line 39: double-buffering the number of runs in a single merge decreases 
and I
I think we still need data to decide whether the double-buffering is worth it. 
Even without that the patch is a big improvement to the logic.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@a1052
PS7, Line 1052:
Can we preserve this DCHECK but make it about the next page being pinned, or is 
it too awkward? We could move it into the if (is_pinned_) and make it something 
like:

  DCHECK(page_index + 1 == pages->size() || pages[page_index + 1]->is_pinned());


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@880
PS7, Line 880:   // Attempt to pin the first fixed and var-length pages.
Comment needs updating.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@881
PS7, Line 881:   if (fixed_len_pages_.size() > 0) {
Might be cleaner to rewrite these as loops, e.g.

  for (int i = 0; i < max(2, fixed_len_pages_.size()), i++)


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1049
PS7, Line 1049: page_index
nit: idx/index inconsistency. I'm fine with either but the inconsistency in the 
function is a little distracting.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1052
PS7, Line 1052: DCHECK(next_unpinned_page != nullptr);
This DCHECK doesn't seem useful - if we messed up the vector addressing 
arithmetic we would almost certainly get an invalid but non-NULL pointer.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1551
PS7, Line 1551: int64_t Sorter::ComputeMinReservation() {
Not a big deal but we could reduce the vertical whitespace in this function. I 
don't mind it personally but some people prefer denser code.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1682
PS7, Line 1682:   for (int i = 0; i < sorted_runs_.size(); ++i) {
Maybe use range-based for loop? I find it more readable personally.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1706
PS7, Line 1706:   int pages_needed_for_full_merge = sorted_runs_.size() * 
PinnedPagesPerRunForMerge();
Couldn't we compute this more accurately by doing a pass over sorted_runs_? E.g.

  pages_needed = 0
  for (run : sorted_runs_) {
pages_needed += min(2, run->num_fixed_len_pages());
pages_needed += min(2, run->num_var_len_pages());
  }


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1721
PS7, Line 1721:   if (max_runs_per_intermediate_merge + 1 >= 
sorted_runs_.size()) {
I think my above comment about accurately computing the requirement for the 
final merge applies here too. I guess the maximum runs per intermediate merge 
and the requirement for the final merge are different since the first is an 
upper bound only.


http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1727
PS7, Line 1727:   if (max_runs_per_intermediate_merge > sorted_runs_.size() / 
2) {
I had a bit of trouble reasoning through whether this calculation was correct 
for edge cases.

E.g. if sorted_runs_.size() == 17 and max_runs_per_intermediate_merge == 8, 
this is false, which seems correct. But if sorted_runs_.size() == 16 and 
max_runs_per_intermediate_merge == 8, then is false, but it seems like it 
should be true, because we can:

1. Merge 8 runs into 1.
2. Merge 9 runs into the final output.

For me it might be a bit easier to understand if it was instead:

  max_runs_per_intermediate_merge * 2 >= sorted_runs_.size()

Since that avoids the rounding.


http://gerrit.cloudera.org:8080/#/c/9943/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

PS7:
IMPALA-4835 probably caused a bunch of conflicts in the planner tests. 
Hopefully it isn't too much of a pain to regenerate the output - you can copy 
over the actual output from logs/fe_Tests/PlannerTests (maybe you're already 
doing that).


http://gerrit.cloudera.org:8080/#/c/9943/7/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/9943/7/tests/query_test/test_sort.py@45
PS7, Line 45:   def test_multiple_mem_limits(self, vector):
Are these tests now tuned for a 3-node minicluster? Maybe we should skip 
validating TotalMergesPerformed if we're not running against such a minicluster 
- I'm not sure if it will still pass on a one-node local build.



[Impala-ASF-CR] IMPALA-4123 (prep): Parquet column reader cleanup

2018-05-02 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9799 )

Change subject: IMPALA-4123 (prep): Parquet column reader cleanup
..


Patch Set 6: Code-Review+1

I'm confident that this is ready to be submitted. Dan, do you want to have 
another look?


--
To view, visit http://gerrit.cloudera.org:8080/9799
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc00352df3a0b2d605f872ae7e43db2dc90faab1
Gerrit-Change-Number: 9799
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 May 2018 01:03:06 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10273 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 10273
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 May 2018 01:01:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell

2018-05-02 Thread David Knupp (Code Review)
David Knupp has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9195 )

Change subject: IMPALA-6337: Fix infinite loop in Impala shell
..


Patch Set 11:

> Patch Set 11:
>
> > Do you know if this bug exists in 0.1.19? We've actually switched to > that 
> > version elsewhere in the Impala python infra.
> > https://github.com/apache/impala/blob/master/infra/python/deps/requirements.txt#L65
>
> Yes, the bug still exists in 0.1.19. However, depending how we use it, there 
> may not be a real issue in Python infra. The infinite loop in Impala shell is 
> a specific case because of the way we use sqlparse.split.

Here's my concern. Work has already started on turning the impala-shell into an 
actual python package for distributing through PyPI. See:

https://gerrit.cloudera.org/c/9771/
https://issues.apache.org/jira/browse/IMPALA-1071
https://issues.apache.org/jira/browse/IMPALA-6808

With that change, we probably won't be relying on an internally-bundled version 
of sqlparse anymore, especially if we're uploading impala-shell to PyPI.


--
To view, visit http://gerrit.cloudera.org:8080/9195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b
Gerrit-Change-Number: 9195
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 03 May 2018 00:41:53 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET

2018-05-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10265 )

Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc
File be/src/rpc/thrift-util.cc:

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209
PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5));
> As Sailesh asked earlier, do we even need this? Or, couldn't we make it a c
Oh missed that earlier comment. We had some static_assert() from line 63-68 so 
this may not be necessary after all. May be it helps to move the 
static_assert() closer to this function ?



--
To view, visit http://gerrit.cloudera.org:8080/10265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2
Gerrit-Change-Number: 10265
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 02 May 2018 23:59:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps

2018-05-02 Thread Alex Rodoni (Code Review)
Alex Rodoni has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10285 )

Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable 
minidumps
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml
File docs/topics/impala_breakpad.xml:

http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml@79
PS2, Line 79: Cloudera Manager
> This should not be here, since we are reviewing the documentation for Apach
Done



--
To view, visit http://gerrit.cloudera.org:8080/10285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Gerrit-Change-Number: 10285
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 02 May 2018 23:20:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps

2018-05-02 Thread Alex Rodoni (Code Review)
Hello Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10285

to look at the new patch set (#3).

Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable 
minidumps
..

IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps

Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
---
M docs/topics/impala_breakpad.xml
1 file changed, 20 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/3
--
To view, visit http://gerrit.cloudera.org:8080/10285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Gerrit-Change-Number: 10285
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps

2018-05-02 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10285 )

Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable 
minidumps
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml
File docs/topics/impala_breakpad.xml:

http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml@79
PS2, Line 79: Cloudera Manager
This should not be here, since we are reviewing the documentation for Apache 
Impala. The description should be similar to the one above, and it's up to the 
users how they set a parameter.



--
To view, visit http://gerrit.cloudera.org:8080/10285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Gerrit-Change-Number: 10285
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 02 May 2018 23:07:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10165 )

Change subject: IMPALA-6941: load more text scanner compression plugins
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41
PS6, Line 41:  LZO,
:   LZO_INDEX, //Lzo index file.
:   LZ4,
:   ZSTD;
> This and related enums (THdfsCompression, the flatbuffer version) are used
Filed IMPALA-6963



--
To view, visit http://gerrit.cloudera.org:8080/10165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Gerrit-Change-Number: 10165
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 May 2018 23:00:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins

2018-05-02 Thread Tim Armstrong (Code Review)
Hello Zoltan Borok-Nagy, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10165

to look at the new patch set (#7).

Change subject: IMPALA-6941: load more text scanner compression plugins
..

IMPALA-6941: load more text scanner compression plugins

Add extensions for LZ4 and ZSTD (which are supported by Hadoop).
Even without a plugin this results in better behaviour because
we don't try to treat the files with unknown extensions as
uncompressed text.

Also allow loading tables containing files with unsupported
compression types. There was weird behaviour before we knew
of the file extension but didn't support querying the table -
the catalog would load the table but the impalad would fail
processing the catalog update. The simplest way to fix it
is to just allow loading the tables.

Switch to always checking plugin version - running mismatched plugin
is inherently unsafe.

Testing:
Positive case where LZO is loaded is exercised. Added
coverage for negative case where LZO is disabled.

Fixed test gaps:
* Querying LZO table with LZO plugin not available.
* Interacting with tables with known but unsupported text
  compressions.
* Querying files with unknown compression suffixes (which are
  treated as uncompressed text).

Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
---
M be/src/common/global-flags.cc
M be/src/exec/CMakeLists.txt
D be/src/exec/hdfs-lzo-text-scanner.cc
D be/src/exec/hdfs-lzo-text-scanner.h
A be/src/exec/hdfs-plugin-text-scanner.cc
A be/src/exec/hdfs-plugin-text-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/util/backend-gflag-util.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test
A 
testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test
A tests/custom_cluster/test_scanner_plugin.py
M tests/metadata/test_partition_metadata.py
20 files changed, 493 insertions(+), 239 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/10165/7
--
To view, visit http://gerrit.cloudera.org:8080/10165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Gerrit-Change-Number: 10165
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10165 )

Change subject: IMPALA-6941: load more text scanner compression plugins
..


Patch Set 6:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h
File be/src/exec/hdfs-plugin-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@63
PS6, Line 63: HdfsScanner* (*create_scanner_fn)
: (HdfsScanNodeBase* scan_node, RuntimeState* state)
> Maybe you could create a typedef for the function pointers, e.g.:
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@67
PS6, Line 67: Status (*issue_initial_ranges_fn)(
: HdfsScanNodeBase* scan_node, const 
std::vector& files)
> same as above
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@73
PS6, Line 73: SpinLock
> Wouldn't be better to use some read/write lock like boost::shared_mutex?
Yeah I think that makes sense. I was thinking that the critical section is 
probably short enough that it doesn't matter in practice but we don't have to 
reason about that at all with the shared_mutex.


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc
File be/src/exec/hdfs-plugin-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@42
PS6, Line 42: "(Advanced) whitelist of HDFS "
: "text scanner plugins that Impala will try to dynamically 
load."
> I think you should mention that it is comma-separated, or give an example.
I briefly summarized what the plugin does (versus the builtin text parsing). I 
think that addresses the question but lmk if it doesn't.


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@56
PS6, Line 56: HdfsScanner* (*create_scanner_fn) (HdfsScanNodeBase* scan_node, 
RuntimeState* state)
> You could use the typedef here.
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@71
PS6, Line 71: Status (*issue_initial_ranges_fn)(
:   HdfsScanNodeBase* scan_node, const 
std::vector& files)
> and here
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@123
PS6, Line 123:   "GetImpalaBuildVersion", 
reinterpret_cast(_plugin_impala_build_version)));
> nit: too long line
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@126
PS6, Line 126: stringstream ss;
 : ss << "Scanner plugin " << plugin_name << " was built 
against Impala version "
 :<< get_plugin_impala_build_version() << ", but the 
running Impala version is "
 :<< GetDaemonBuildVersion();
> nit: I think it's a little inconsistent that we use stringstream here and S
Done. Yeah I got lazy about converting this :)


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc@643
PS6, Line 643: =_
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc@129
PS6, Line 129: =_
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41
PS6, Line 41:  LZO,
:   LZO_INDEX, //Lzo index file.
:   LZ4,
:   ZSTD;
> should we make these (and e.g. file extension) be runtime parameters as wel
This and related enums (THdfsCompression, the flatbuffer version) are used in a 
lot of places, so it would probably require a lot of changes. I don't think 
it's inherently hard but it would take a while to verify that the fix in each 
place is correct.

At least now we have all the built-in Hadoop decompression codecs: 
https://github.com/apache/hadoop/tree/a0a276162147e843a5a4e028abdca5b66f5118da/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress
 so we could just extend this list when a codec is added to Hadoop (which seems 
to happen at most once a year).


http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py@168
PS6, Line 168: #unique_database = 'test_db'
> nit: remove this line
Done



--
To view, visit 

[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10279 )

Change subject: IMPALA-6931: reduces races in query expiration tests
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 02 May 2018 22:14:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10279 )

Change subject: IMPALA-6931: reduces races in query expiration tests
..

IMPALA-6931: reduces races in query expiration tests

Recent tests ran into flakiness when testing query expiration.
This change makes two changes:
1) query state is retrieved earlier; a flaky test skipped
the expected state.
2) bump the timing; a flaky test had queries expire before
it could check them

Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Reviewed-on: http://gerrit.cloudera.org:8080/10279
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
---
M tests/custom_cluster/test_query_expiration.py
1 file changed, 5 insertions(+), 7 deletions(-)

Approvals:
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps

2018-05-02 Thread Alex Rodoni (Code Review)
Hello Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10285

to look at the new patch set (#2).

Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable 
minidumps
..

IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps

Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
---
M docs/topics/impala_breakpad.xml
1 file changed, 20 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/2
--
To view, visit http://gerrit.cloudera.org:8080/10285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Gerrit-Change-Number: 10285
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps

2018-05-02 Thread Alex Rodoni (Code Review)
Alex Rodoni has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10285


Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable 
minidumps
..

IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps

Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
---
M docs/topics/impala_breakpad.xml
1 file changed, 19 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/1
--
To view, visit http://gerrit.cloudera.org:8080/10285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Gerrit-Change-Number: 10285
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 


[Impala-ASF-CR] IMPALA-6949: Add the option to start the minicluster with EC enabled

2018-05-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/10275 )

Change subject: IMPALA-6949: Add the option to start the minicluster with EC 
enabled
..

IMPALA-6949: Add the option to start the minicluster with EC enabled

In this patch we add the "ERASURE_CODING" enviornment variable. If we
enable it, a cluster with 5 data nodes will be created during data
loading and HDFS will be started with erasure coding enabled.

Testing:
I ran the core build, and verified that erasure coding gets enabled in
HDFS. Many of our EE tests failed however.

Change-Id: I397aed491354be21b0a8441ca671232dca25146c
---
M bin/impala-config.sh
M bin/run-all-tests.sh
M testdata/bin/setup-hdfs-env.sh
M testdata/cluster/admin
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
5 files changed, 28 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/10275/3
--
To view, visit http://gerrit.cloudera.org:8080/10275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I397aed491354be21b0a8441ca671232dca25146c
Gerrit-Change-Number: 10275
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-6949: Add the option to start the minicluster with EC enabled

2018-05-02 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10275 )

Change subject: IMPALA-6949: Add the option to start the minicluster with EC 
enabled
..


Patch Set 2:

(5 comments)

Yes, nested data loading succeeds every time I tried it.

http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh@449
PS2, Line 449: elif [ "${TARGET_FILESYSTEM}" = "hdfs-ec" ]; then
> By making this its own TARGET_FILESYSTEM, you end up disabling a bunch of t
I don't think it makes sense to make erasure coding its own file system. I made 
a separate flag for it.


http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh@454
PS2, Line 454:   echo "Valid values are: hdfs, isilon, s3, local"
> Please update?
No need to update any more.


http://gerrit.cloudera.org:8080/#/c/10275/2/bin/run-all-tests.sh
File bin/run-all-tests.sh:

http://gerrit.cloudera.org:8080/#/c/10275/2/bin/run-all-tests.sh@71
PS2, Line 71:   FE_TEST=false
> Add a comment about why?
Done


http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/bin/setup-hdfs-env.sh
File testdata/bin/setup-hdfs-env.sh:

http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/bin/setup-hdfs-env.sh@80
PS2, Line 80:   hdfs ec -unsetPolicy -path "${HDFS_EC_PATH:=/test-warehouse/}"
> Do you know what this does underneath the covers? Do we need to block while
This does not convert the existing files in the directory. It only affects the 
files that will be placed in the directory in the future. This is why it is 
possible to have some files erasure coded and some not in the same directory.


http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl:

http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl@26
PS2, Line 26: cloudera.erasure_coding.enabled
> What's this for?
This is needed because running the following command

hdfs ec -enablePolicy -policy RS-3-2-1024k

Causes this error:

RemoteException: enableErasureCodingPolicy is not allowed because 
cloudera.erasure_coding.enabled=false



--
To view, visit http://gerrit.cloudera.org:8080/10275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I397aed491354be21b0a8441ca671232dca25146c
Gerrit-Change-Number: 10275
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Comment-Date: Wed, 02 May 2018 21:23:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10273 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..


Patch Set 3:

Hit IMPALA-6960


--
To view, visit http://gerrit.cloudera.org:8080/10273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 10273
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 21:18:52 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10273 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2402/


--
To view, visit http://gerrit.cloudera.org:8080/10273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 10273
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 21:10:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6959: [DOCS] Update to HAProxy configuration sample code

2018-05-02 Thread Alex Rodoni (Code Review)
Alex Rodoni has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10284


Change subject: IMPALA-6959: [DOCS] Update to HAProxy configuration sample code
..

IMPALA-6959: [DOCS] Update to HAProxy configuration sample code

- Changed to deprecated timeouts: contimeout, clitimeout, srvtimeout
- Changed the sample timeout values to more realistic values
- Added a note that actual timeout values should depend on
  the user cluster

Change-Id: Idff3aa9bbb58c1953cb7e9394ade01c7833c3a34
---
M docs/topics/impala_proxy.xml
1 file changed, 6 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10284/1
--
To view, visit http://gerrit.cloudera.org:8080/10284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idff3aa9bbb58c1953cb7e9394ade01c7833c3a34
Gerrit-Change-Number: 10284
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 


[Impala-ASF-CR] IMPALA-6916: Implement COMMENT ON DATABASE

2018-05-02 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/10171 )

Change subject: IMPALA-6916: Implement COMMENT ON DATABASE
..

IMPALA-6916: Implement COMMENT ON DATABASE

This patch implements updating comment on a database.

Syntax:
COMMENT ON DATABASE db IS 'comment'

Testing:
- Added new front-end tests
- Ran all front-end tests
- Added new end-to-end tests
- Ran end-to-end DDL tests

Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00
---
M common/thrift/CatalogService.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
A fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M tests/metadata/test_ddl.py
M tests/metadata/test_ddl_base.py
15 files changed, 275 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/10171/8
--
To view, visit http://gerrit.cloudera.org:8080/10171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00
Gerrit-Change-Number: 10171
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-6916: Implement COMMENT ON DATABASE

2018-05-02 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10171 )

Change subject: IMPALA-6916: Implement COMMENT ON DATABASE
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10171/6/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/10171/6/common/thrift/JniCatalog.thrift@635
PS6, Line 635:   // Name of comment to alter. When this field is not set, the 
comment will be removed.
> Comment what happens if the comment is not set
Done


http://gerrit.cloudera.org:8080/#/c/10171/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10171/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3534
PS6, Line 3534:   }
> There's a subtle race here with the statestore topic update thread that cal
Yes, you're right. The new patch set uses the database lock.



--
To view, visit http://gerrit.cloudera.org:8080/10171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00
Gerrit-Change-Number: 10171
Gerrit-PatchSet: 8
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Adam Holley 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Comment-Date: Wed, 02 May 2018 21:01:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10273 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2400/


--
To view, visit http://gerrit.cloudera.org:8080/10273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 10273
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 20:51:09 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET

2018-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10265 )

Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc
File be/src/rpc/thrift-util.cc:

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209
PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5));
> How did the test pass to begin with ? Also, not sure if it warrants a strnc
As Sailesh asked earlier, do we even need this? Or, couldn't we make it a 
compile time check?



--
To view, visit http://gerrit.cloudera.org:8080/10265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2
Gerrit-Change-Number: 10265
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 02 May 2018 19:36:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10233 )

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 18:59:08 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET

2018-05-02 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10265 )

Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc
File be/src/rpc/thrift-util.cc:

http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209
PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5) =
> Oops, yes. Changed and re-ran the test again to make sure it works.
How did the test pass to begin with ? Also, not sure if it warrants a strncmp 
every time we call this function. Can we initialize a boolean once during 
initialization ?



--
To view, visit http://gerrit.cloudera.org:8080/10265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2
Gerrit-Change-Number: 10265
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 02 May 2018 18:53:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10279 )

Change subject: IMPALA-6931: reduces races in query expiration tests
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2401/


--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 02 May 2018 18:32:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10279 )

Change subject: IMPALA-6931: reduces races in query expiration tests
..


Patch Set 1:

agreed. an immediate next step is to break this test into smaller/simpler ones 
to avoid noise from interactions. would still be useful to test interactions, 
but for that, we'll want to make this easier to test.


--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 02 May 2018 18:30:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10279 )

Change subject: IMPALA-6931: reduces races in query expiration tests
..


Patch Set 1: Code-Review+2

If this doesn't do the trick we may want to rethink how we can test this 
feature in a non racy way.


--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Comment-Date: Wed, 02 May 2018 18:20:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins

2018-05-02 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10165 )

Change subject: IMPALA-6941: load more text scanner compression plugins
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41
PS6, Line 41:  LZO,
:   LZO_INDEX, //Lzo index file.
:   LZ4,
:   ZSTD;
should we make these (and e.g. file extension) be runtime parameters as well so 
that you can add a plugin without having to recompile? Okay with me if that's 
future work, but wondering if there's anything that makes that overly difficult.



--
To view, visit http://gerrit.cloudera.org:8080/10165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Gerrit-Change-Number: 10165
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 May 2018 18:17:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10116 )

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py
File tests/metadata/test_last_ddl_time_update.py:

http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52
PS7, Line 52:   self.substitutions = {'TBL': "%s.%s" % (db_name, tbl_name), 
'WAREHOUSE': WAREHOUSE}
Mention the supported substitutions in the methods below (and add comments 
there).


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56
PS7, Line 56: False, False
Is there a reason we don't test changes to both? Do they never change at the 
same time?


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76
PS7, Line 76: TestLastDdlTimeUpdate
Do you need to specify the class to access TestHelper?


http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179
PS7, Line 179:   def run_test(self, query, db_name, table_name,
Can you move this method into the Helper class now?



--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 May 2018 18:14:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10233 )

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10233/3/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/10233/3/be/src/util/dict-encoding.h@238
PS3, Line 238: static constexpr int32_t DICT_DECODER_BUFFER_SIZE = 128;
I added some DCHECKS in this file and ran into a linkage issue. The fix was to 
move it outside of the class (I don't think there's a way to get static linkage 
for C++ class static members).


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326
PS2, Line 326: LIKELY
> LIKELY seems to be strangely placed now - both conditions should be include
Yeah I agree both conditions should be included -


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679
PS2, Line 679: int32_t num_repeats = NextNumRepeats();
> DictDecoder::DecodeNextValue also calls NextNumLiterals() and NextNumRep
Good catch - fixed - I went through the uint32_t references there and fixed 
those.


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681
PS2, Line 681:int32_t num_repeats_to_set =
 :std::min(num_repeats, num_values_to_consume - 
num_consumed);
> We could change this to signed too and remove the template parameter from s
Done


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694
PS2, Line 694: int32_t num_literals_to_set =
 : std::min(num_literals, num_values_to_consume - 
num_consumed);
> Same as in line 681.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 17:54:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Tim Armstrong (Code Review)
Hello Csaba Ringhofer,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10233

to look at the new patch set (#3).

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..

IMPALA-6946: handle negative counts in RLE decoder

This improves the handling of out-of-range values to avoid hitting various
DCHECKs, including the one in the JIRA. repeat_count_ and literal_count_
are int32_ts. Avoid setting them to a negative value directly or by
integer overflow.

Switch to using uint32_t for "VLQ" encoding, which should be ULEB-128
encoding according to the Parquet standard. This fixes an infinite loop
in PutVlqInt() for negative values - the bug was that shifting right
sign-extended the signed value.

Testing:
Added backend test to exercise handling of large literal and repeat
counts that don't fit in an int32_t. Before these fixes it could trigger
several DCHECKs.

Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
---
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
5 files changed, 84 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/10233/3
--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10273 )

Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..


Patch Set 3: Code-Review+2

Had to update some planner tests because of different file sizes on 2.x


--
To view, visit http://gerrit.cloudera.org:8080/10273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Gerrit-Change-Number: 10273
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 May 2018 16:58:01 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation

2018-05-02 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/10273

to review the following change.


Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation
..

IMPALA-6679,IMPALA-6678: reduce scan reservation

This has two related changes.

IMPALA-6679: defer scanner reservation increases

When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
-
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-parquet-scanner-test.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/bin/compute-table-stats.sh
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 

[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins

2018-05-02 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10165 )

Change subject: IMPALA-6941: load more text scanner compression plugins
..


Patch Set 6:

(11 comments)

LGTM overall, had some minor comments.

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h
File be/src/exec/hdfs-plugin-text-scanner.h:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@63
PS6, Line 63: HdfsScanner* (*create_scanner_fn)
: (HdfsScanNodeBase* scan_node, RuntimeState* state)
Maybe you could create a typedef for the function pointers, e.g.:

 typedef HdfsScanner* (*CreateScannerFn)
 (HdfsScanNodeBase* scan_node, RuntimeState* state);
 CreateScannerFn create_scanner_fn;

I think it would be more readable, and also you wouldn't need to copy the whole 
function prototype to hdfs-plugin-text-scanner.cc


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@67
PS6, Line 67: Status (*issue_initial_ranges_fn)(
: HdfsScanNodeBase* scan_node, const 
std::vector& files)
same as above


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@73
PS6, Line 73: SpinLock
Wouldn't be better to use some read/write lock like boost::shared_mutex?


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc
File be/src/exec/hdfs-plugin-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@42
PS6, Line 42: "(Advanced) whitelist of HDFS "
: "text scanner plugins that Impala will try to dynamically 
load."
I think you should mention that it is comma-separated, or give an example.


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@56
PS6, Line 56: HdfsScanner* (*create_scanner_fn) (HdfsScanNodeBase* scan_node, 
RuntimeState* state)
You could use the typedef here.


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@71
PS6, Line 71: Status (*issue_initial_ranges_fn)(
:   HdfsScanNodeBase* scan_node, const 
std::vector& files)
and here


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@123
PS6, Line 123:   "GetImpalaBuildVersion", 
reinterpret_cast(_plugin_impala_build_version)));
nit: too long line


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@126
PS6, Line 126: stringstream ss;
 : ss << "Scanner plugin " << plugin_name << " was built 
against Impala version "
 :<< get_plugin_impala_build_version() << ", but the 
running Impala version is "
 :<< GetDaemonBuildVersion();
nit: I think it's a little inconsistent that we use stringstream here and 
Substitute at other places.


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc@643
PS6, Line 643: =_
nit: missing space


http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc@129
PS6, Line 129: =_
nit: missing space


http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py@168
PS6, Line 168: #unique_database = 'test_db'
nit: remove this line



--
To view, visit http://gerrit.cloudera.org:8080/10165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Gerrit-Change-Number: 10165
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 May 2018 16:24:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder

2018-05-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10233 )

Change subject: IMPALA-6946: handle negative counts in RLE decoder
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326
PS2, Line 326: LIKELY
LIKELY seems to be strangely placed now - both conditions should be included in 
the likely branch, or maybe only the new condition, as current_value_ == value 
can be often false, for example in dict encoded pages.


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679
PS2, Line 679: int32_t num_repeats = NextNumRepeats();
DictDecoder::DecodeNextValue also calls NextNumLiterals() and 
NextNumRepeats(), and expects uint32_t - can you do some cleanup there?


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681
PS2, Line 681:uint32_t num_repeats_to_set =
 :std::min(num_repeats, num_values_to_consume 
- num_consumed);
We could change this to signed too and remove the template parameter from 
std::min, as both arguments and the result would be signed.


http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694
PS2, Line 694: uint32_t num_literals_to_set =
 : std::min(num_literals, num_values_to_consume - 
num_consumed);
Same as in line 681.



--
To view, visit http://gerrit.cloudera.org:8080/10233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb
Gerrit-Change-Number: 10233
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Comment-Date: Wed, 02 May 2018 14:54:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#7).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/7
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#6).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/6
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata

2018-05-02 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Zoltan Borok-Nagy,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10116

to look at the new patch set (#5).

Change subject: IMPALA-6131: Track time of last statistics update in metadata
..

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_last_ddl_time_update.py
6 files changed, 182 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/5
--
To view, visit http://gerrit.cloudera.org:8080/10116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Gerrit-Change-Number: 10116
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests

2018-05-02 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10279


Change subject: IMPALA-6931: reduces races in query expiration tests
..

IMPALA-6931: reduces races in query expiration tests

Recent tests ran into flakiness when testing query expiration.
This change makes two changes:
1) query state is retrieved earlier; a flaky test skipped
the expected state.
2) bump the timing; a flaky test had queries expire before
it could check them

Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
---
M tests/custom_cluster/test_query_expiration.py
1 file changed, 5 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/10279/1
--
To view, visit http://gerrit.cloudera.org:8080/10279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Gerrit-Change-Number: 10279
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac