[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1:

Oh sorry, just saw it was -1

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1: -Code-Review

David, what made you +1? You seemed concerned.

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1: Code-Review-1

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1:

s/for that file/for that module/

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1:

So, I'm a little concerned about this. The datetime being installed here isn't 
the stdlib datetime. It's a thirdparty module from Zope. When you install it, 
you also get other dependencies...

   Installing collected packages: zope.interface, pytz, datetime
   Successfully installed datetime-4.2 pytz-2017.2 zope.interface-4.4.1

And if you look at the comments in the setup.py for that file, this is included:

   description="""\
   This package provides a DateTime data type, as known from Zope 2.
   Unless you need to communicate with Zope 2 APIs, you're probably
   better off using Python's built-in datetime module.""".replace('\n', ' ')

https://github.com/zopefoundation/DateTime/blob/master/setup.py

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-24 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 5: Code-Review+2

Looks like the GVO hit IMPALA-5358, which is non-deterministic. The fix for it 
is in GVO. While it is in progress, I'll give it another shot.

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/640/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1: Code-Review+2

Thanks for being brave enough to switch to 16.04 and fix issues with it!

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/639/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5358: Fix repeatable table sample.
..


Patch Set 1:

Correct.

-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5358: Fix repeatable table sample.
..


Patch Set 1:

Ok, thanks - I think you mean that inputParts was arbitrarily ordered, so the 
bug was non-deterministic (but most of the time, gave the correct results in 
this case). That sounds fine, agree that we would expect current tests to find 
this.

-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5358: Fix repeatable table sample.
..


Patch Set 1:

Thanks, Henry. Can you clarify what you mean by REPEAT() clause? Did you mean 
REPEATABLE? That clause is mandatory for most tests and the PlannerTest that 
failed did have a test. The underlying issue is that we store the partitions as 
a Map so iterating over the map can give us the partitions 
in an arbitrary order. I'll keep thinking whether we can have a more direct 
test, but luckiy our tests did ultimately catch this issue :)

-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5358: Fix repeatable table sample.
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/638/

-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 17: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 17:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/636/

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins, Jim Apple, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6950

to look at the new patch set (#17).

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..

IMPALA-5347: Parquet scanner microoptimizations

A mix of microoptimizations that profiling the parquet scanner revealed.
All lead to some measurable improvement and added up to significant
speedups for various scans.

* Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
  in all cases.
* Apply __restrict__ in a few places so the compiler knows that it is
  safe to cache values accessed via those pointers
* memset() the whole batch instead of the null indicators is cases where
  it is almost certainly cheaper.
* Avoid updating two correlated loop variables in MaterializeValueBatch().
* Avoid unnecessary initialization of often-unused 'val' in ReadSlot().
* Shave a few instructions off the (still very expensive) bit unpacking
  and dict decoding logic.

Performance:

Some local TPC-H and targeted-perf benchmarks showed average speedups of
~5%.

I did some benchmarks targeted at measuring column materialisation
performance using a version of lineitem with duplicated data to make
it bigger. These queries all got significantly faster.

Dict-encoded DECIMAL: 2.23 -> 1.23s

  SELECT count(*) FROM biglineitem WHERE l_quantity > 49

Plain-encoded BIGINT: 2.33s -> 1.62s

  SELECT count(*) FROM biglineitem WHERE l_orderkey != 10

Dict-encoded STRING: 2.73s -> 1.72s

  SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A'

Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy)
  SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37

Multiple columns: 5.15s -> 3.74s

  SELECT count(*) FROM biglineitem
  WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100

Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
---
M be/src/common/compiler-util.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/tuple.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
9 files changed, 117 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/635/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-24 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6989

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..

IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

Depending explicitly on datetime in requirements.txt fixes the issue,
even though datetime is nominally included in the standard library.

Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
---
M infra/python/deps/requirements.txt
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6989/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Reviewed-on: http://gerrit.cloudera.org:8080/6878
Reviewed-by: Taras Bobrovytsky 
Tested-by: Impala Public Jenkins
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M tests/custom_cluster/test_permanent_udfs.py
12 files changed, 323 insertions(+), 63 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Taras Bobrovytsky: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


Patch Set 11: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


IMPALA-4923: reduce memory transfer for selective scans

Most of the code changes are to restructure things so that the
scratch batch's tuple buffer is stored in a separate MemPool
from auxiliary memory such as decompression buffers. This part
of the change does not change the behaviour of the scanner in
itself, but allows us to recycle the tuple buffer without holding
onto unused auxiliary memory.

The optimisation is implemented in TryCompact(): if enough rows
were filtered out during the copy from the scratch batch to the
output batch, the fixed-length portions of the surviving rows
(if any) are copied to a new, smaller, buffer, and the original,
larger, buffer is reused for the next scratch batch.

Previously the large buffer was always attached to the output batch,
so a large buffer was transferred between threads for every scratch
batch processed. In combination with the decompression buffer change
in IMPALA-5304, this means that in many cases selective scans don't
produce nearly as many empty or near-empty batches and do not attach
nearly as much memory to each batch.

Performance:
Even on an 8 core machine I see some speedup on selective scans.
Profiling with "perf top" also showed that time in TCMalloc
was reduced - it went from several % of CPU time to a minimal
amount.

Running TPC-H on the same machine showed a ~5% overall improvement
and no regressions. E.g. Q6 got 20-25% faster.

I hope to do some additional cluster benchmarking on systems
with more cores to verify that the severe performance problems
there are fixed, but in the meantime it seems like we have enough
evidence that it will at least improve things.

Testing:
Add a couple of selective scans that exercise the new code paths.

Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Reviewed-on: http://gerrit.cloudera.org:8080/6949
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
6 files changed, 212 insertions(+), 60 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 16: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/632/

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5358: Fix repeatable table sample.
..


Patch Set 1: Code-Review+2

Was there a REPEAT(..) clause that could have picked this up that we could put 
in a test? Feel free to commit w/o to unblock the tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Vincent Tran (Code Review)
Vincent Tran has uploaded a new patch set (#4).

Change subject: IMPALA-5030: Adds support for NVL2() function
..

IMPALA-5030: Adds support for NVL2() function

This change adds support to the function NVL2(expr1, expr2, expr3).
This function returns expr2 if expr1 is not null, else it returns
expr3. NVL2() is converted to IF() prior to analysis.

Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
2 files changed, 29 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/6962/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

PS3, Line 100: params.exprs().get(0)
> Will this work if the user simply calls nvl2() ? Should this be part of the
Done


Line 100:   plist.add(new IsNullPredicate(params.exprs().get(0), true));
> What if NVL2() is called without params? How about doing:
Done


PS3, Line 106:}
> nit: Wrong indentation.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 8:

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/634/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters
..


IMPALA-5324: Fix version check in EvalDictionaryFilters

Due to a bootstrapping issue with the dictionary
filtering code change, the parquet version check used in
EvalDictionaryFilters was checking for < 2.10. However,
the impala 2.9 parquet contains the appropriate encoding,
so this changes the version check to be < 2.9.

Change-Id: Icc216332171038f74ff1d2ce3066da8167095361
Reviewed-on: http://gerrit.cloudera.org:8080/6969
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-scanner.cc
1 file changed, 3 insertions(+), 4 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 100:   plist.add(new IsNullPredicate(params.exprs().get(0), true));
What if NVL2() is called without params? How about doing:

List plist = Lists.newArrayList(params.exprs());
if (!plist.isEmpty()) {
  plist.set(0, new IsNullPredicate(plist.get(0), true));
}
return new FunctionCallExpr("if", plist);


-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

PS3, Line 100: params.exprs().get(0)
Will this work if the user simply calls nvl2() ? Should this be part of the 
analysis ?


PS3, Line 106:}
nit: Wrong indentation.


-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Vincent Tran (Code Review)
Vincent Tran has uploaded a new patch set (#3).

Change subject: IMPALA-5030: Adds support for NVL2() function
..

IMPALA-5030: Adds support for NVL2() function

This change adds support to the function NVL2(expr1, expr2, expr3).
This function returns expr2 if expr1 is not null, else it returns
expr3. NVL2() is converted to IF() prior to analysis.

Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
2 files changed, 30 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/6962/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6985

Change subject: IMPALA-5358: Fix repeatable table sample.
..

IMPALA-5358: Fix repeatable table sample.

The bug was a simple oversight where we iterated over
an unordered list of partitions.

Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/6985/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-24 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/6962/1//COMMIT_MSG
Commit Message:

Line 17
> Clean up commit msg
Done


http://gerrit.cloudera.org:8080/#/c/6962/2//COMMIT_MSG
Commit Message:

Line 11: expr3.
> Briefly mention that NVL2() is converted to IF before analysis.
Done


http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 329:   // Binary predicates must be rewritten to a canonical form for 
both Kudu predicate
> Please look the places that reference BetweenToCompoundRule.INSTANCE. There
Abandoned this approach in favor of IF FunctionCallExpr


http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 85:   public static Expr createExpr(FunctionName fnName, FunctionParams 
params) {
> The current approach looks too complicated. I think we should directly tran
Done


http://gerrit.cloudera.org:8080/#/c/6962/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 94: if (fnName.getFnNamePath().get(0).equalsIgnoreCase("nvl2")) {
> Please follow the same checks as above for "decode". You could have:
Done


Line 95:   Preconditions.checkArgument(params.size()==3, "NVL2() "
> Preconditions checks are like asserts. They are used to enforce code invari
Done


Line 97:   Preconditions.checkArgument(
> Same here, inappropriate use of Preconditions check. In addition, the match
Done


http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/rewrite/Nvl2ToCaseRule.java
File fe/src/main/java/org/apache/impala/rewrite/Nvl2ToCaseRule.java:

Line 33
> Provide class comment with examples like in the other ExprRewriteRules
Done


Line 39
> negate this condition and inline the Nvl2RewriteFunction for brevity
Done


Line 53
> Remove tabs.
Done


Line 55
> Why not IF? That seems shorter and more appropriate. Take a look at TupleIs
Done


http://gerrit.cloudera.org:8080/#/c/6962/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

Line 1605: 
> remove whitespace
Done


Line 1606: // NVL2() 
> mention that NVL2() is converted to IF before analysis
Done


Line 1607: AnalyzesOk("select nvl2(1, 'not null', 'null')");
> Add negative cases to show what error message will be shown, for example:
Done


http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 81: 
> Remove tabs
Done


Line 85: RewritesOk("int_col not between float_col and double_col", rule,
> Add additional tests:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 4: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/626/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values

2017-05-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-5338: Fix Kudu timestamp column default values
..

IMPALA-5338: Fix Kudu timestamp column default values

While support for TIMESTAMP columns in Kudu tables has been
committed (IMPALA-5137), it does not support TIMESTAMP
column default values.

This supports CREATE TABLE syntax to specify the default
values, but more importantly this fixes the loading of Kudu
tables that may have had default values set on
UNIXTIME_MICROS columns, e.g. if the table was created via
the python client. This involves fixing KuduColumn to hide
the LiteralExpr representing the default value because it
will be a BIGINT if the column type is TIMESTAMP. It is only
needed to call toSql() and toStringValue(), so helper
functions are added to KuduColumn to encapsulate special
logic for TIMESTAMP.

TODO: Add support and tests for ALTER setting the default
value (when IMPALA-4622 is committed).

Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
---
M be/src/exec/kudu-scanner.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/timestamp-value.h
M be/src/runtime/timestamp-value.inline.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M tests/query_test/test_kudu.py
14 files changed, 245 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6936/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values

2017-05-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5338: Fix Kudu timestamp column default values
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 87:   // post-analysis. For TIMESTAMPs those are Kudu UNIXTIME_MICROS, 
i.e. int64s.
> i don't like that this changes its meaning as a result of analysis, because
Done


http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

Line 167:   public static Object getKuduDefaultValue(TExpr defaultValue,
> supply function comment
Done


http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 2403: "ts timestamp not null default cast('2009-01-01 00:00:00' as 
timestamp)) " +
> should we also allow a string constant here (that we then try to convert to
Sure I can do that


-- 
To view, visit http://gerrit.cloudera.org:8080/6936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 3: Code-Review+2

Great job on getting your first patch accepted! This looks good to me and I 
will rebase and merge it shortly.

-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 3:

> > Patch Set 3: Commit message was updated.

-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6982/1//COMMIT_MSG
Commit Message:

Line 9: Changed the argument of the message to print the size instead of
> We usually try to keep the commit message line lengths to 70 characters per
Done


Line 10: the memory location.
> Can you mention how you tested it over here?
Done


http://gerrit.cloudera.org:8080/#/c/6982/2//COMMIT_MSG
Commit Message:

PS2, Line 8: 
> Nit: The title can exceed 70 chars. We want the title to be on one line so 
Done


Line 16: value of num_bytes and performing a select on a table in parquet 
format.
> It'll also be good to mention why you're not adding automated tests.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 3:

> Patch Set 3: Commit message was updated.

Done.

-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6982/2//COMMIT_MSG
Commit Message:

PS2, Line 8: instead of value
Nit: The title can exceed 70 chars. We want the title to be on one line so we 
can see it in the single line commit list on github. I should have mentioned 
that earlier.


Line 16:- num_bytes = *data_size + 1
It'll also be good to mention why you're not adding automated tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-24 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..

IMPALA-4864 Speed up single slot predicates with dictionaries

When dictionaries are present we can pre-evaluate conjuncts
against the dictionary values and simply look up the result.

Status: ready for review.  Seems to be passing all tests, runs with
ASAN, and gives expected results.  Need to come up with some specific
test cases that exercise this functionality and measure performance.

Basic idea: since we codegen so early, before we know enough details
about the columns to know if they are dict filterable, if we do have
dictionary filtering predicates, we codegen a guard around each
dictionary filterable predicate evaluation.  This guard skips
evaluation of the predicate if it has already been evaluated by the
dictionary.  In this way, we can skip evaluation dynamically for
each row group as we learn which columns are dictionary filterable,
and then push predicate evaluation into the column reader.  Since the
branches will remain 100% predictable over the row group, this should
give us the fastest way to skip over predicate evaluation without
compromising the general case where we may be unable to evaluate
against the dictionary.  We can even do this with codegen turned
off, as a side effect of the way we generate the codegen'd function
when dictionary evaluation is enabled.

If dictionaries aren't available for some predicates, we automatically
fall back to evaluating those predicates in the original order,
preserving selectivity.  The overhead in this case is a perfectly
predictable extra conditional per dictionary predicate.

Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch.h
M be/src/util/bitmap-test.cc
M be/src/util/bitmap.h
M be/src/util/dict-encoding.h
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
20 files changed, 564 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 8:

> Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/

This patch depends on a recent commit to the impala-setup repo:
https://github.com/awleblang/impala-setup/commit/5d1a054ac0307953cbdf495fe929ee1b84bc5335

The jenkins workers seem to run it once and not run it again for the lifetime 
of that worker. So the dependency will only get updated once my GVO gets fresh 
executors. So, I will try GVO-ing this multiple times in hopes that I will get 
fresh executors with the new dependency installed.

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6840/6//COMMIT_MSG
Commit Message:

PS6, Line 30:  
> run
Done


http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS6, Line 304: else {
 :   tableStats_ = new TTableStats(-1);
 :   tableStats_.setTotal_file_bytes(-1);
 : }
> Do you need this?
Removed.


http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/View.java
File fe/src/main/java/org/apache/impala/catalog/View.java:

PS6, Line 118: tableStats_ = new TTableStats(-1);
 :   tableStats_.setTotal_file_bytes(-1);
> Didn't you remove this? Maybe I am confused.
My bad. really removed now.


http://gerrit.cloudera.org:8080/#/c/6840/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS8, Line 724: post-sampling bytes
> What is the "post-sampling bytes"?
Rephrased.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new patch set (#2).

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..

IMPALA-5232: Parquet reader error message prints memory address
instead of value

Changed the argument of the message to print the size instead of
the memory location.
Testing: Manually tested the error message by modifying the value
of num_bytes and performing a select on a table in parquet format.
Tested both condition with the following values:
- num_bytes = -1
- num_bytes = *data_size + 1

Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/6982/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 8: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..


Patch Set 8: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6840/6//COMMIT_MSG
Commit Message:

PS6, Line 30:  
run


http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS6, Line 304: else {
 :   tableStats_ = new TTableStats(-1);
 :   tableStats_.setTotal_file_bytes(-1);
 : }
Do you need this?


http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/View.java
File fe/src/main/java/org/apache/impala/catalog/View.java:

PS6, Line 118: tableStats_ = new TTableStats(-1);
 :   tableStats_.setTotal_file_bytes(-1);
Didn't you remove this? Maybe I am confused.


http://gerrit.cloudera.org:8080/#/c/6840/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

PS8, Line 724: post-sampling bytes
What is the "post-sampling bytes"?


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


Patch Set 11: Code-Review+2

Rebased and fixed the conflicts. Forwarding the +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


Patch Set 11:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/633/

-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#11).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M tests/custom_cluster/test_permanent_udfs.py
12 files changed, 323 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Taras Bobrovytsky (Code Review)
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6878

to look at the new patch set (#11).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M tests/custom_cluster/test_permanent_udfs.py
12 files changed, 323 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Taras Bobrovytsky (Code Review)
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6878

to look at the new patch set (#11).

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..

IMPALA-5259: Add REFRESH FUNCTIONS  statement

Before this patch, Impala relied on INVALIDATE METADATA to load
externally added UDFs from HMS. The problem with this approach is that
INVALIDATE METADATA affects all databases and tables in the entire
cluster.

In this patch, we add a REFRESH FUNCTIONS  statement that reloads
the functions of a database from HMS. We return a list of updated and
removed db functions to the issuing Impalad in order to update its
local catalog cache.

Testing:
- Ran a private build which passed.

Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M tests/custom_cluster/test_permanent_udfs.py
12 files changed, 322 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6982/1//COMMIT_MSG
Commit Message:

Line 9: Changed the argument of the message to print the size instead of the 
memory location.
We usually try to keep the commit message line lengths to 70 characters per 
line. This is so that the message fits into the gerrit Web UI's commit message 
box.

(It doesn't matter if only the title goes over 70 characters though)


Line 10: 
Can you mention how you tested it over here?

You can say "Testing: Tested manually by doing blah blah" and also give some 
reasoning as to why you cannot add an automated test for this.

You can look at one of my previous commit messages to get a decent idea:
https://gerrit.cloudera.org/#/c/6886/


-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 16:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/632/

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/631/

-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 8: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6949/7/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS7, Line 40: // Number of tuples transferred to output batches (i.e. not 
filtered by predicates).
> I know "output batches" already implies potentially more than one row batch
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6949

to look at the new patch set (#8).

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..

IMPALA-4923: reduce memory transfer for selective scans

Most of the code changes are to restructure things so that the
scratch batch's tuple buffer is stored in a separate MemPool
from auxiliary memory such as decompression buffers. This part
of the change does not change the behaviour of the scanner in
itself, but allows us to recycle the tuple buffer without holding
onto unused auxiliary memory.

The optimisation is implemented in TryCompact(): if enough rows
were filtered out during the copy from the scratch batch to the
output batch, the fixed-length portions of the surviving rows
(if any) are copied to a new, smaller, buffer, and the original,
larger, buffer is reused for the next scratch batch.

Previously the large buffer was always attached to the output batch,
so a large buffer was transferred between threads for every scratch
batch processed. In combination with the decompression buffer change
in IMPALA-5304, this means that in many cases selective scans don't
produce nearly as many empty or near-empty batches and do not attach
nearly as much memory to each batch.

Performance:
Even on an 8 core machine I see some speedup on selective scans.
Profiling with "perf top" also showed that time in TCMalloc
was reduced - it went from several % of CPU time to a minimal
amount.

Running TPC-H on the same machine showed a ~5% overall improvement
and no regressions. E.g. Q6 got 20-25% faster.

I hope to do some additional cluster benchmarking on systems
with more cores to verify that the severe performance problems
there are fixed, but in the meantime it seems like we have enough
evidence that it will at least improve things.

Testing:
Add a couple of selective scans that exercise the new code paths.

Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
6 files changed, 212 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-24 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6982

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..

IMPALA-5232: Parquet reader error message prints memory address instead of value

Changed the argument of the message to print the size instead of the memory 
location.

Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/6982/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 


[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values

2017-05-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5338: Fix Kudu timestamp column default values
..


Patch Set 1:

(1 comment)

Responding to Marcel's higher level comment first, I'll address the code 
comments assuming we don't change approaches.

http://gerrit.cloudera.org:8080/#/c/6936/1//COMMIT_MSG
Commit Message:

Line 19: will be a BIGINT if the column type is TIMESTAMP. It is only
> what if you stored the default as a timestamp literal instead (and then con
The problem is that then the catalog would need to do the conversion from 
TimestampValue to unixtime micros. That would get hairy because the conversion 
function exists as a UDF, and it'd be really weird for the catalog to start 
calling UDFs. It could be exposed explicitly over JNI but that's also not ideal.

This approach does involve some special casing, but I think at least we can be 
clear about when the conversion happens and what is stored on the KuduColumn 
class.


-- 
To view, visit http://gerrit.cloudera.org:8080/6936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 7: Code-Review+2

Happy with the change after Michael's final comment is addressed

-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 8: Code-Review+2

Updated bootstrap_build.sh to include the libffi-dev dependency after the 
discussion on the dev list.

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Sailesh Mukil (Code Review)
Hello Impala Public Jenkins, Matthew Jacobs, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6910

to look at the new patch set (#8).

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..

IMPALA-5333: Add support for Impala to work with ADLS

This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
---
M bin/bootstrap_build.sh
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M infra/python/bootstrap_virtualenv.py
A infra/python/deps/adls-requirements.txt
M infra/python/deps/compiled-requirements.txt
M infra/python/deps/pip_download.py
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_hdfs_fd_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_scanners.py
M tests/stress/test_ddl_stress.py
A tests/util/adls_util.py
M tests/util/filesystem_utils.py
42 files changed, 357 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6910/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 7: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6949/7/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS7, Line 40: // Number of tuples transferred to output batches (i.e. not 
filtered by predicates).
I know "output batches" already implies potentially more than one row batch but 
it doesn't hurt to point that out explicitly here or below so readers can keep 
in mind that num_tuples_transferred > num_rows_to_commit implies transfer of 
some tuples to previous row batches.


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-24 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6980/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS1, Line 502: !insertStmt.hasNoClusteredHint()
 :   && !ctx_.isSingleNodeExec())
1 line


PS1, Line 504: Expr kuduPartitionExpr = new 
KuduPartitionExpr(DescriptorTable.TABLE_SINK_ID,
 : (KuduTable) insertStmt.getTargetTable(),
 : 
Lists.newArrayList(insertStmt.getPartitionKeyExprs()),
 : insertStmt.getPartitionColPos());
how about refactoring this to KuduUtil to avoid duplicate code w/ 
DistributedPlan which could get out of sync later?


-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6980/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

Line 213: 
add test with both hints


-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 15: Code-Review+2

THank you!

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6950/14/be/src/common/compiler-util.h
File be/src/common/compiler-util.h:

Line 46: /// Clang is incredibly pedantic about __restrict__ and doesn't offer 
an option to be
> Can you be more specific?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Marcel Kornacker, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6950

to look at the new patch set (#15).

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..

IMPALA-5347: Parquet scanner microoptimizations

A mix of microoptimizations that profiling the parquet scanner revealed.
All lead to some measurable improvement and added up to significant
speedups for various scans.

* Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
  in all cases.
* Apply __restrict__ in a few places so the compiler knows that it is
  safe to cache values accessed via those pointers
* memset() the whole batch instead of the null indicators is cases where
  it is almost certainly cheaper.
* Avoid updating two correlated loop variables in MaterializeValueBatch().
* Avoid unnecessary initialization of often-unused 'val' in ReadSlot().
* Shave a few instructions off the (still very expensive) bit unpacking
  and dict decoding logic.

Performance:

Some local TPC-H and targeted-perf benchmarks showed average speedups of
~5%.

I did some benchmarks targeted at measuring column materialisation
performance using a version of lineitem with duplicated data to make
it bigger. These queries all got significantly faster.

Dict-encoded DECIMAL: 2.23 -> 1.23s

  SELECT count(*) FROM biglineitem WHERE l_quantity > 49

Plain-encoded BIGINT: 2.33s -> 1.62s

  SELECT count(*) FROM biglineitem WHERE l_orderkey != 10

Dict-encoded STRING: 2.73s -> 1.72s

  SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A'

Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy)
  SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37

Multiple columns: 5.15s -> 3.74s

  SELECT count(*) FROM biglineitem
  WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100

Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
---
M be/src/common/compiler-util.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/tuple.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
9 files changed, 117 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5259: Add REFRESH FUNCTIONS  statement
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values

2017-05-24 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5338: Fix Kudu timestamp column default values
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6936/1//COMMIT_MSG
Commit Message:

Line 19: will be a BIGINT if the column type is TIMESTAMP. It is only
what if you stored the default as a timestamp literal instead (and then convert 
it to what kudu needs when that's required)? it feels like the current approach 
requires a bunch of special-casing.


http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 87:   // post-analysis. For TIMESTAMPs those are Kudu UNIXTIME_MICROS, 
i.e. int64s.
i don't like that this changes its meaning as a result of analysis, because 
we're planning on making analysis idempotent (at some point in the future).

let's make it a separate variable, and then you can get rid of isanalyzed_


http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

Line 167:   public static Object getKuduDefaultValue(TExpr defaultValue,
supply function comment


http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 2403: "ts timestamp not null default cast('2009-01-01 00:00:00' as 
timestamp)) " +
should we also allow a string constant here (that we then try to convert to a 
timestamp, and if that fails, we complain)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 7:

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/628/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 14:

I did a dry run and it turns out clang was unhappy with a couple of the 
qualifiers.

I think clang's implementation of restrict is somewhat broken so as a 
workaround I made a macro to remove that qualifier for clang.

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE)
..


IMPALA-5167: Reduce the number of Kudu clients created (BE)

Creating Kudu clients is very expensive as each will fetch
metadata from the Kudu master, so we should minimize the
number of Kudu clients that get created.

This patch stores a map from Kudu master addressed to Kudu
clients in the ExecEnv to be used across the BE for all
queries. Another patch will address the FE.

This relies on a change on the Kudu side that clears
non-covered range entries from the client's cache on
table open (fdc022fe6231af20e307012d98c35b16cbfa7b33)

Testing:
- Ran a stress test on a 10 node cluster: scan of a small
  Kudu table, 1000 concurrent queries, load on the Kudu
  master was reduced signficantly, from ~50% cpu to ~5%.
  (with the FE changes included)
- Ran the Kudu e2e tests.
- Manually ran a test with concurrent INSERTs and
  'ALTER TABLE ADD PARTITION' (which is affected by the
  Kudu side change mentiond above) and verified
  correctness.

Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Reviewed-on: http://gerrit.cloudera.org:8080/6792
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
8 files changed, 73 insertions(+), 69 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE)
..


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 102: ++num_output_batches;
> I think the idea was right, but the code wasn't quite implementing the idea
I looked again and realised we don't even need to track num_output_batches. 
num_tuples_transferred > num_to_commit implies that tuples were transferred to 
a previous batch.

It's enough of a simplification it seems worth doing.


http://gerrit.cloudera.org:8080/#/c/6949/4/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS4, Line 142:   Tuple* uncompacted_tuple = row->GetTuple(0);
> FOREACH_ROW() ?
I looked earlier - the rows aren't committed to the batch yet so FOREACH_ROW 
doesn't work for this case.


PS4, Line 155: }
> not needed ?
Yep, and also the ones below.


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6949

to look at the new patch set (#7).

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..

IMPALA-4923: reduce memory transfer for selective scans

Most of the code changes are to restructure things so that the
scratch batch's tuple buffer is stored in a separate MemPool
from auxiliary memory such as decompression buffers. This part
of the change does not change the behaviour of the scanner in
itself, but allows us to recycle the tuple buffer without holding
onto unused auxiliary memory.

The optimisation is implemented in TryCompact(): if enough rows
were filtered out during the copy from the scratch batch to the
output batch, the fixed-length portions of the surviving rows
(if any) are copied to a new, smaller, buffer, and the original,
larger, buffer is reused for the next scratch batch.

Previously the large buffer was always attached to the output batch,
so a large buffer was transferred between threads for every scratch
batch processed. In combination with the decompression buffer change
in IMPALA-5304, this means that in many cases selective scans don't
produce nearly as many empty or near-empty batches and do not attach
nearly as much memory to each batch.

Performance:
Even on an 8 core machine I see some speedup on selective scans.
Profiling with "perf top" also showed that time in TCMalloc
was reduced - it went from several % of CPU time to a minimal
amount.

Running TPC-H on the same machine showed a ~5% overall improvement
and no regressions. E.g. Q6 got 20-25% faster.

I hope to do some additional cluster benchmarking on systems
with more cores to verify that the severe performance problems
there are fixed, but in the meantime it seems like we have enough
evidence that it will at least improve things.

Testing:
Add a couple of selective scans that exercise the new code paths.

Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
6 files changed, 210 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-24 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS1, Line 890: o
> How about replace('hello world', '0', 'z') and just illustrate the original
The lowercase folding is just something for the column heading/alias.


-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-24 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS1, Line 890: o
> Maybe best to alias the column/expression or use a different example.  This
How about replace('hello world', '0', 'z') and just illustrate the original 
string is returned if nothing matches.  Show case sensitivity is not very 
useful, I think it is sufficient just to mention it.

One other notable behavior - patterns which match their replacement do not 
consider the replacement for further expansion.  So replace('foo', 'foo', 
'foofoo') results in 'foofoo', not a heap / stack overflow.


-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 102: ++num_output_batches;
> Turns out it's the incorrect thing to do (found a bug in testing by test_me
I think the idea was right, but the code wasn't quite implementing the idea 
(sorry for the distraction). If we increment the non-empty output batches at 
the end and modify the check to non_empty_output_batches > 0, then I think it 
can work.

However, even that is suboptimal because what we really want to count are the 
non-compacted output batches because those are they only ones can still 
reference the tuple mem.

Anyway, seems easy to forget a case and break this code, so let's worry about 
that potential improvement later.


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-24 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS1, Line 890: o
> This looks like a bug. It might be worth calling out the oddness here that 
Maybe best to alias the column/expression or use a different example.  This 
happens because impala lowercases all column aliases.


-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6949/4/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

PS4, Line 142: for (int i = dst_batch->num_rows(); i < end_row; ++i) {
FOREACH_ROW() ?


PS4, Line 155: inline
not needed ?


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-24 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS1, Line 890: o
This looks like a bug. It might be worth calling out the oddness here that 'o' 
is caps in the command but lowercase in the column name.


-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/627/

-- 
To view, visit http://gerrit.cloudera.org:8080/6969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6950

to look at the new patch set (#13).

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..

IMPALA-5347: Parquet scanner microoptimizations

A mix of microoptimizations that profiling the parquet scanner revealed.
All lead to some measurable improvement and added up to significant
speedups for various scans.

* Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
  in all cases.
* Apply __restrict__ in a few places so the compiler knows that it is
  safe to cache values accessed via those pointers
* memset() the whole batch instead of the null indicators is cases where
  it is almost certainly cheaper.
* Avoid updating two correlated loop variables in MaterializeValueBatch().
* Avoid unnecessary initialization of often-unused 'val' in ReadSlot().
* Shave a few instructions off the (still very expensive) bit unpacking
  and dict decoding logic.

Performance:

Some local TPC-H and targeted-perf benchmarks showed average speedups of
~5%.

I did some benchmarks targeted at measuring column materialisation
performance using a version of lineitem with duplicated data to make
it bigger. These queries all got significantly faster.

Dict-encoded DECIMAL: 2.23 -> 1.23s

  SELECT count(*) FROM biglineitem WHERE l_quantity > 49

Plain-encoded BIGINT: 2.33s -> 1.62s

  SELECT count(*) FROM biglineitem WHERE l_orderkey != 10

Dict-encoded STRING: 2.73s -> 1.72s

  SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A'

Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy)
  SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37

Multiple columns: 5.15s -> 3.74s

  SELECT count(*) FROM biglineitem
  WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100

Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/tuple.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
8 files changed, 107 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 979: Status HdfsParquetScanner::ResetScratchBatch() {
> Looks like this is only called in a single place, so might as well move the
Done


http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

Line 492:   for (int i = 0; i < num_tuples; ++i) {
> int64_t i since num_tuples is an int64_t
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations

2017-05-24 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5347: Parquet scanner microoptimizations
..


Patch Set 12: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 979: Status HdfsParquetScanner::ResetScratchBatch() {
Looks like this is only called in a single place, so might as well move the two 
lines there and remove this function.


http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

Line 492:   for (int i = 0; i < num_tuples; ++i) {
int64_t i since num_tuples is an int64_t


-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/624/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE

2017-05-24 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6981

Change subject: IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE 
and ALTER TABLE
..

IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE

Review part 1: Just tackling the CREATE TABLE syntax and examples for
now. Will have separate gerrits for ALTER TABLE and more far-flung
places to mention this feature (New Features, Performance
Considerations, Parquet File Format).

Change-Id: Icd571cd8840368edb327d16d27192458838ef234
---
M docs/topics/impala_create_table.xml
1 file changed, 74 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/6981/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6981
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd571cd8840368edb327d16d27192458838ef234
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/626/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-24 Thread Alex Behm (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6840

to look at the new patch set (#6).

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..

IMPALA-2373: Extrapolate row counts for HDFS tables.

The main idea of this patch is to use table stats to
extrapolate the row counts for new/modified partitions.

Existing behavior:
- Partitions that lack the row count stat are ignored
  when estimating the cardinality of HDFS scans. Such
  partitions effectively have an estimated row count
  of zero.
- We always use the row count stats for partitions that
  have one. The row count may be innaccurate if data in
  such partitions has changed significantly.

Summary of changes:
- Enhance COMPUTE STATS to also store the total number
  of file bytes in the table.
- Use the table-level row count and file bytes stats
  to estimate the number of rows in a scan.
- A new impalad startup flag is added to enable/disable
  the extrapolation behavior. The feature is disabled by
  default. Note that even with the feature disabled,
  COMPUTE STATS stores the file bytes so you can enable
  the feature without having to COMPUTE STATS again.

Testing:
- Added new FE unit test
- Added new EE test

Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
---
M be/src/common/global-flags.cc
M be/src/exec/catalog-op-executor.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
A tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_explain.py
27 files changed, 766 insertions(+), 177 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/624/

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6949

to look at the new patch set (#6).

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..

IMPALA-4923: reduce memory transfer for selective scans

Most of the code changes are to restructure things so that the
scratch batch's tuple buffer is stored in a separate MemPool
from auxiliary memory such as decompression buffers. This part
of the change does not change the behaviour of the scanner in
itself, but allows us to recycle the tuple buffer without holding
onto unused auxiliary memory.

The optimisation is implemented in TryCompact(): if enough rows
were filtered out during the copy from the scratch batch to the
output batch, the fixed-length portions of the surviving rows
(if any) are copied to a new, smaller, buffer, and the original,
larger, buffer is reused for the next scratch batch.

Previously the large buffer was always attached to the output batch,
so a large buffer was transferred between threads for every scratch
batch processed. In combination with the decompression buffer change
in IMPALA-5304, this means that in many cases selective scans don't
produce nearly as many empty or near-empty batches and do not attach
nearly as much memory to each batch.

Performance:
Even on an 8 core machine I see some speedup on selective scans.
Profiling with "perf top" also showed that time in TCMalloc
was reduced - it went from several % of CPU time to a minimal
amount.

Running TPC-H on the same machine showed a ~5% overall improvement
and no regressions. E.g. Q6 got 20-25% faster.

I hope to do some additional cluster benchmarking on systems
with more cores to verify that the severe performance problems
there are fixed, but in the meantime it seems like we have enough
evidence that it will at least improve things.

Testing:
Add a couple of selective scans that exercise the new code paths.

Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-scratch-tuple-batch.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
6 files changed, 211 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans

2017-05-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4923: reduce memory transfer for selective scans
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 102:   void FinalizeTupleTransfer(RowBatch* dst_batch, int num_to_commit) {
> Done. Yeah that's a more precise thing to do.
Turns out it's the incorrect thing to do (found a bug in testing by 
test_mem_usage_scaling). The problem is if you have a non-empty output batch, 
followed by an empty one, this is 1, but it's actually invalid to reuse the 
tuple memory, because the first batch is still referencing it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


  1   2   >