[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1303/


--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Thu, 05 Oct 2017 05:58:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 3: Code-Review+2

(1 comment)

Carrying +2.

http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS2, Line 293:   LOG.info("Metadata load request already in progress for 
table: " +
> Remove "Another"
Done



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Thu, 05 Oct 2017 05:57:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8212

to look at the new patch set (#3).

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/3
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


Re: [Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Alexander Behm
Great work, John! Thanks for your continued effort to get this important
change merged. Much appreciated.

On Wed, Oct 4, 2017 at 7:26 PM, Impala Public Jenkins (Code Review) <
ger...@cloudera.org> wrote:

> Impala Public Jenkins *merged* this change.
>
> View Change 
> Approvals: Sailesh Mukil: Looks good to me, approved Impala Public
> Jenkins: Verified
>
> IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
>
> - Previously TThreadPoolServer called getTransport() on a client from
>   the Server thread (the thread that did the accepts).
>   - TSaslServerTransport->getTransport() called TSaslTransport->open()
>   - TSaslServerTransport->open() tried to negotiate SASL which calls
> read/write
> - If read/write blocks indefinitely, the TThreadPoolServer could
>   not accept connections until tcp_keepalive kicked in.
> - Set the underlying TSocket's recvTimeout and sendTimeout before the
>   TSaslServerTransport->open() and reset them to 0 after open()
>   completes.
> - Added sasl_connect_tcp_timeout_ms flag that defaults to 30
>   milliseconds (5 minutes)
> - Add the ability for TAcceptQueueServer to limit the maximum
>   number of concurrent tasks
> - Added a test case to thrift-server-test to test
>   max_concurrent_connections enforcement
> - Changed the remaining Thrift servers to use TAcceptQueueServer.
>   (hs2/beeswax/network-perf-benchmark)
>   - The timeout is still needed in TAcceptQueueServer since
> SetupConnection follows a similar pattern that TThreadPoolServer
> does.
> - Removed support for TThreadPool from ThriftServer() since it is
>   no longer used anywhere. ThriftServer() now always uses
>   TAcceptQueueServer.
> - Deprecated enable_accept_queue_server flag and removed supporting
>   code.
>
> Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
> Reviewed-on: http://gerrit.cloudera.org:8080/7061
> Reviewed-by: Sailesh Mukil 
> Tested-by: Impala Public Jenkins
> ---
> M be/src/benchmarks/network-perf-benchmark.cc
> M be/src/common/global-flags.cc
> M be/src/rpc/TAcceptQueueServer.cpp
> M be/src/rpc/TAcceptQueueServer.h
> M be/src/rpc/thrift-server-test.cc
> M be/src/rpc/thrift-server.cc
> M be/src/rpc/thrift-server.h
> M be/src/service/impala-server.cc
> M be/src/transport/TSaslServerTransport.cpp
> M common/thrift/metrics.json
> 10 files changed, 157 insertions(+), 110 deletions(-)
>
> To view, visit change 7061 . To
> unsubscribe, visit settings .
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-MessageType: merged
> Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
> Gerrit-Change-Number: 7061
> Gerrit-PatchSet: 14
> Gerrit-Owner: John Sherman 
> Gerrit-Reviewer: Dan Hecht 
> Gerrit-Reviewer: Henry Robinson 
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: John Sherman 
> Gerrit-Reviewer: Matthew Jacobs 
> Gerrit-Reviewer: Michael Brown 
> Gerrit-Reviewer: Sailesh Mukil 
> Gerrit-Reviewer: Taras Bobrovytsky 
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-cr+unsubscr...@cloudera.com.
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 14: Verified+1

> Thanks! Also thanks to you and Henry for taking the time to review.
 > (technically my 2nd patch, I removed an invalid DCHECK a few months
 > ago, but it is my first patch of substance)
 > I'll probably start working in my spare time on IMPALA-5393 for my
 > next contribution.
 >
 > > > It looks like it passed gerri-verify-external -
 > https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/
 > >
 > > Great! I'll rebase and merge this.
 > > Congratulations on getting your first patch in!

Sure thing! Please feel free to reach out at any time. Your commit can be 
viewed here:
https://github.com/apache/incubator-impala/commit/4dd0f1b3d84f67eb40bf671160b057be9bbdb921


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 14
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Thu, 05 Oct 2017 05:41:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8150 )

Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression
..


Abandoned

Putting this on hold as it's not clear we will proceed with this route.
--
To view, visit http://gerrit.cloudera.org:8080/8150
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e
Gerrit-Change-Number: 8150
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 2: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@72
PS1, Line 72: LOG.info(String.format("Remaining items in queue: %s. 
Loads in progress: %s",
> Not totally sure I understand this. Do you mean the current table (represen
Sorry, I got confused about the code flow. Current code lgtm.


http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/2/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS2, Line 293:   LOG.info("Another metadata load request already in 
progress for table: " +
Remove "Another"



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Thu, 05 Oct 2017 03:54:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 4:

Yeah, the profile is:
File Formats: PARQUET/GZIP:1 PARQUET/SNAPPY:1

Should I add an end-to-end test for this? My test processes are:
hive> use functional_parquet;
hive> create table alltypes_mix like alltypes;
hive> set parquet.compression=snappy;
hive> insert into table alltypes_mix partition (year=2017,month=1) select 
id,bool_col,tinyint_col,smallint_col,int_col,bigint_col,float_col,double_col,date_string_col,string_col,timestamp_col
 from alltypes where year=2010 and month=1;

hive> set parquet.compression=gzip;
hive> insert into table alltypes_mix partition (year=2017,month=2) select 
id,bool_col,tinyint_col,smallint_col,int_col,bigint_col,float_col,double_col,date_string_col,string_col,timestamp_col
 from alltypes where year=2010 and month=1;

Then in impala-shell:
[localhost:21000] > invalidate metadata functional_parquet.alltypes_mix;
[localhost:21000] > select * from functional_parquet.alltypes_mix;
[localhost:21000] > profile;


--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 03:22:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 15: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 15
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 03:04:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..

IMPALA-5908: Allow SET to unset modified query options.

The query 'SET =""' will now unset an option within the session,
reverting it to its default state.

This change became necessary when "SET" started returning an empty
string for unset options which don't have a default. The test
infrastructure (impala_test_suite.py) resets options to what it thinks
is its defaults, and, when this broke, some ASAN builds started to fail,
presumably due to a timing issue with how we re-use connections between
tests.

Previously, SessionState copied over the default options from the server
when the session was created and then mutated that. To support unsetting
options at the session layer, this change keeps a pointer to the default
server settings, keeps separately the mutations, and overlays the
options each time they're requested. Similarly, for configuration
overlays that happen per-query, the overlay is now done explicitly,
because empty per-query overlay values (key=..., value="") now have no effect.

Because "set key=''" is ambiguous between "set to the empty string" and
"unset", it's now impossible to set to the empty string, at the session
layer, an option that is configured at a previous layer. In practice,
this is just debug_action and request_pool. debug_action is essentially
an internal tool. For request_pool, this means that setting the default
request_pool via impalad command line is now a bad idea, as it can't
be cleared at a per-session level. For request_pool, the correct
course of action for users is to use placement rules, and to have a
default placement rule.

Testing:
* Added a simple test that triggered this side-effect without this code.
  Specifically, "impala-python infra/python/env/bin/py.test 
tests/metadata/test_set.py -s"
  with the modified set.test triggers.
* Amended tests/custom_cluster/test_admission_controller.py; it was
  useful for testing these code paths.
* Added cases to query-options-test to check behavior for both
  defaulted and non-defaulted values.
* Added a custom cluster test that checks that overlays are
  working against
* Ran an ASAN build where this was triggering previously.

Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Reviewed-on: http://gerrit.cloudera.org:8080/8070
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_admission_controller.py
A tests/custom_cluster/test_set_and_unset.py
M tests/hs2/hs2_test_suite.py
14 files changed, 266 insertions(+), 26 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 16
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 4:

Did you the test the case if one partition is snappy and another gzip both 
scanned by the same HDFS SCAN Node?
Can you please paste what the output in the query profile looks like?


--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:51:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..

IMPALA-5988: optimise MemPool::TryAllocate()

Testing:
Ran core tests.

Perf:
Experiments using this on top of a WIP Avro patch for IMPALA-5307
showed noticable improvements in CPU efficiency - up to 10%

Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Reviewed-on: http://gerrit.cloudera.org:8080/8145
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/runtime/mem-pool.h
6 files changed, 32 insertions(+), 21 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:50:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching

2017-10-04 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8200 )

Change subject: IMPALA-4623: [DOCS] Document file handle caching
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml@338
PS1, Line 338: continuously appended by an HDFS mechanism
This also applies if an HDFS file is overwritten in place.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967
PS1, Line 967: although the encryption layer
 : adds overhead that might lessen the benefit of the 
caching.
I'm not familiar with this overhead. What is this referring to?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973
PS1, Line 973: 20 thousand
Just curious: How do you decide to use "20 thousand" vs "20,000"?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991
PS1, Line 991: evict any stale file handles from the cache
The file handles won't actually be evicted directly. The new metadata will mean 
that new statements will no longer use that file handle and eventually it will 
get aged out. I'm not sure if this distinction is important for documentation, 
but I think the important thing is that the memory may not be freed 
immediately. (This is something we are likely to change in a future release.)


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995
PS1, Line 995: To evaluate the effectiveness of file handle caching for a 
particular workload, issue the
 : PROFILE statement in 
impala-shell or examine query
 : profiles in the Impala web UI. Look for the ratio of 
CachedFileHandlesHitCount
 : (ideally, should be high) to 
CachedFileHandlesMissCount (ideally, should be low).
 : Before starting any evaluation, run some representative 
queries to warm up the cache,
 : because the first time each data file is accessed is 
always recorded as a cache miss.
I'm not sure this belongs here, but information about the cache across the 
whole impalad is available via the metrics page under impala-server:
impala-server.io.mgr.cached-file-handles-miss-count
impala-server.io.mgr.cached-file-handles-hit-count

The total number of file handles in the cache is:
impala-server.io.mgr.num-cached-file-handles



--
To view, visit http://gerrit.cloudera.org:8080/8200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
read/write
- If read/write blocks indefinitely, the TThreadPoolServer could
  not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 30
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
SetupConnection follows a similar pattern that TThreadPoolServer
does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Reviewed-on: http://gerrit.cloudera.org:8080/7061
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 157 insertions(+), 110 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 14
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:26:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8212

to look at the new patch set (#2).

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/2
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 2
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@72
PS1, Line 72: LOG.info(String.format("Remaining items in queue: %s. 
Loads in progress: %s",
> I think the "Remaining" part is now wrong because the size includes the tab
Not totally sure I understand this. Do you mean the current table (represented 
by tableName_) would be included in tableLoadingDeque_.size() ? If thats what 
you meant, doesn't L288 remove it off the queue? Please correct me if you meant 
something else here.


http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS1, Line 293:   return;
> Can you also add a log message here indicating that the table is already be
Done



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:18:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-10-04 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h@485
PS6, Line 485:   /// Tracks all the memory allocation used by dictionary in 
DictDecoder.
 :   struct DictMemTrack {
 : std::unique_ptr mem_tracker;
 :
 : DictMemTrack(MemTracker *parent_memtrack):mem_tracker(new 
MemTracker(-1,
 : "Decoder parquet dict", parent_memtrack, false)) { };
 :
 : MemTracker*  GetMemTracker() {
 :   return  mem_tracker.get();
 : }
 :
 : ~DictMemTrack() {
 :   mem_tracker->CloseAndUnregisterFromParent();
 : }
 :   };
> I dont think we need to add a new struct for this, you can just call CloseA
I had added a new structure to have the creation /destruction of memtracker in 
one place

and also the CloseAndUnregisterFromParent() has to be called after
releasing all bytes owned by MemTracker in the DictDecoder class (which happens 
in it's destructor), since the destructor of member variables are called later 
than the owner making a separate struct/class for DictMemTrack  solved the 
problem.


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@113
PS6, Line 113:  // Bytes used for memory tracking by dictionary in DictEncoder 
is decremented.
 :   virtual void Cleanup() {
 : if (dict_encoder_base_ != nullptr) {
 :   dict_encoder_base_->Cleanup();
 : }
 :   }
> you can move this to the BaseColumnWriter::Close() method
This can't be done because I'm not tracking the bytes at the BaseColumnWriter


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@767
PS6, Line 767:   // Below code will decrement the bytes used for memory 
tracking by dictionary in
 :   // DictEncoder class for each ColumnWriter.
 :   for (int i = 0; i < columns_.size(); i++) {
 :  if (columns_[i] != nullptr) {
 :columns_[i]->Cleanup();
 :  }
 :   }
> similarly you can move this to the HdfsParquetTableWriter::Close() method
I'll try moving the functionality in Close() for HdfsParquetTableWriter/Reader


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@138
PS6, Line 138: SizeofDict
> I think we can use a more precise name here, SizeOfDict might be confused w
I've changed it to be DictByteSize


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@438
PS6, Line 438: bytes_alloc += sizeof(value);
> can you switch to what joe suggested earlier, that is, instead of bytes_all
This looks fine what is the issue with this?



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 05 Oct 2017 01:58:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

2017-10-04 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8056 )

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8056/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8056/8//COMMIT_MSG@15
PS8, Line 15: Ran the query_test/test_scanners_fuzz.py in a loop (5 times)
> I was thinking more like overnight (or a few hours). It's worth doing just
OK I'll  run overnight and check tomorrow morning what happens.



--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 8
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 01:32:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 2:

(2 comments)

Updated the description as well; waits until catalog is ready now.

http://gerrit.cloudera.org:8080/#/c/8202/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/1/be/src/service/impala-server.cc@2043
PS1, Line 2043:  ImpalaServer::Start
> the relationship between this code and the HS2/Beeswax servers would be cle
moved this into init since my understanding now is that the ports should be 
opened only when the catalog is ready.


http://gerrit.cloudera.org:8080/#/c/8202/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/1/fe/src/main/java/org/apache/impala/service/Frontend.java@905
PS1, Line 905: // 1) Analysis completes successfully.
> perhaps this should be a Preconditions check, if our intention is to preven
Done. Agreed that when we get here, catalog should always be in ready state.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 05 Oct 2017 01:23:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Vuk Ercegovac (Code Review)
Hello Philip Zeyliger, Balazs Jeszenszky, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8202

to look at the new patch set (#2).

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..

IMPALA-4704: Disallow client connections to imapalad until catalog is received.

Currently, impalad starts beeswax and hs2 servers even if the catalog has not 
yet
been received. As a result, client connections see an error message stating that
the impalad is not yet ready.

This patch changes the impalad startup sequence to wait until the catalog is
received before opening beeswax and hs2 ports and starting their servers.

Testing:
- python e2e tests that start a cluster without a catalog and check that
  client connections are rejected as expected.

Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-server.cc
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M tests/common/impala_cluster.py
A tests/custom_cluster/test_catalog_wait.py
9 files changed, 120 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/8202/2
--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5142 EventSequence displays negative elapsed time.

2017-10-04 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8215


Change subject: IMPALA-5142 EventSequence displays negative elapsed time.
..

IMPALA-5142 EventSequence displays negative elapsed time.

EventSequence may display a negative elapsed time if ClientRequestState is done
while EventSequence::MarkEvent() is in progress.

The time is calculated based on POSIX clock_gettime() which uses 
CLOCK_MONOTONIC that
will always give a time in increasing time order. In this case the cause of 
negative
delay is start time becoming greater than end time, as the difference of latter 
and
the former is used to compute the delay.

The negative delay case can happen if EventSequence::Start() is called on the 
same
EventSequence for which EventSequence::MarkEvent() is in progress.It can be 
serialized
by reordering the lock in EventSequence::MarkEvent() but that may be a 
performance
overhead, calling ElapsedTime() with lock held.So this patch fixes the issue by 
returning
a 0 delay value when such an issue ever happens rather than returning a 
negative delay
value.

Testing:
---
Ran all the front-end/backend and end-end tests.

Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
---
M be/src/util/stopwatch.h
1 file changed, 6 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8215/1
--
To view, visit http://gerrit.cloudera.org:8080/8215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8c944396d96473b17b453da3e913ffc56680a896
Gerrit-Change-Number: 8215
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh


[Impala-ASF-CR] IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool

2017-10-04 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8003 )

Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer 
pool
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml
File docs/topics/impala_buffer_pool_limit.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@53
PS4, Line 53: MiB
> MB. I may have mixed up MB / KB and MiB / KiB in a few places throughout. M
Done


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml
File docs/topics/impala_max_row_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@68
PS4, Line 68: 

[Impala-ASF-CR] IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool

2017-10-04 Thread John Russell (Code Review)
Hello Tim Armstrong, Mostafa Mokhtar, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8003

to look at the new patch set (#5).

Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer 
pool
..

IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool

In particular, the new query options:

BUFFER_POOL_LIMIT
MAX_ROW_SIZE
MIN_SPILLABLE_BUFFER_SIZE <- Maybe the 'spillable' part of the name
  will change to 'spill', as used in other BP-related discussions.

Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
A docs/topics/impala_buffer_pool_limit.xml
A docs/topics/impala_default_spillable_buffer_size.xml
A docs/topics/impala_max_row_size.xml
A docs/topics/impala_min_spillable_buffer_size.xml
M docs/topics/impala_scalability.xml
8 files changed, 600 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8003/5
--
To view, visit http://gerrit.cloudera.org:8080/8003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e
Gerrit-Change-Number: 8003
Gerrit-PatchSet: 5
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException thrown by aggregate function.

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8143 )

Change subject: IMPALA-4682 Fix IllegalStateException thrown by aggregate 
function.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8143/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8143/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@893
PS2, Line 893: AnalysisError("select l_partkey from tpch_avro.lineitem 
order by count(l_orderkey)",
Test seems to be misplaced and/or duplicated. This test is TestStar() yet there 
is no star int his query.

I suggest you move these tests into TestAggregates() instead and check whether 
some of these cases are already covered.



--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 05 Oct 2017 00:22:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8212 )

Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@72
PS1, Line 72: LOG.info(String.format("Remaining items in queue: %s. 
Loads in progress: %s",
I think the "Remaining" part is now wrong because the size includes the table 
which is going to be loaded. The msg should reflect that.


http://gerrit.cloudera.org:8080/#/c/8212/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@293
PS1, Line 293:   return;
Can you also add a log message here indicating that the table is already being 
loaded by another thread?



--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Thu, 05 Oct 2017 00:18:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 05 Oct 2017 00:15:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8147/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8147/3/be/src/exec/hdfs-scan-node-base.h@537
PS3, Line 537: int Size() { return BitUtil::Popcount(bit_map_); }
> One nit: can you add a blank link after this method.
Sorry, I know you mean a blank line now... Done



--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:58:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8147

to look at the new patch set (#4).

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..

IMPALA-5448: fix invalid number of splits reported in Parquet scan node

Parquet splits with multi columns are marked as completed by using
HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types
as column codec types. Thus the number of parquet splits are the real count
multiplies number of materialized columns.

Furthermore, according to the Parquet definition, it allows mixed compression
codecs on different columns. This's handled in this patch as well. A parquet 
file
using gzip and snappy compression codec will be reported as:
FileFormats: PARQUET/(GZIP,SNAPPY):1

This patch introduces a compression types set for the above cases.

Testing:
Add end-to-end tests handling parquet files with all columns compressed in
snappy, and handling parquet files with multi compression codec.

Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
A testdata/multi_compression_parquet_data/README
A testdata/multi_compression_parquet_data/tinytable_0_gzip_snappy.parq
A testdata/multi_compression_parquet_data/tinytable_1_snappy_gzip.parq
A 
testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test
M tests/query_test/test_scanners.py
7 files changed, 112 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8147/4
--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8147/3/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8147/3/be/src/exec/hdfs-scan-node-base.h@537
PS3, Line 537: int Size() { return BitUtil::Popcount(bit_map_); }
> One nit: can you add a blank link after this method.
Sorry, what is a blank link? Should I write the function in 3 lines?


http://gerrit.cloudera.org:8080/#/c/8147/3/testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test
File 
testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test:

http://gerrit.cloudera.org:8080/#/c/8147/3/testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test@10
PS3, Line 10: select * from functional_parquet.multi_compression
Forgot to change the db name! Will fix it



--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:49:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 13:

Thanks! Also thanks to you and Henry for taking the time to review. 
(technically my 2nd patch, I removed an invalid DCHECK a few months ago, but it 
is my first patch of substance)
I'll probably start working in my spare time on IMPALA-5393 for my next 
contribution.

> > It looks like it passed gerri-verify-external - 
> > https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/
 >
 > Great! I'll rebase and merge this.
 > Congratulations on getting your first patch in!


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:26:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
..


Patch Set 3: Code-Review+1

Looks good. I'll ask Mostafa to confirm that the new output looks good to him 
but then I think we should be able to go ahead and merge.


--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:18:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue

2017-10-04 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8143 )

Change subject: IMPALA-4682 Fix IllegalStateException issue
..


Patch Set 1:

(2 comments)

> Zoram, let's try to move this forward. We don't want too many open
 > CRs.

Done. Have added new test cases to exercise the scenario reported in the JIRA.

http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-4682 Fix IllegalStateException issue
> Can you state the issue in short?
Done


http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@731
PS1, Line 731: + selectListItem.toSql());
> Can  we add a test  for this?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:10:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException thrown by aggregate function.

2017-10-04 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Dimitris Tsirogiannis, anujphadke, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8143

to look at the new patch set (#2).

Change subject: IMPALA-4682 Fix IllegalStateException thrown by aggregate 
function.
..

IMPALA-4682 Fix IllegalStateException thrown by aggregate function.

When one runs a query like

'select * from t order by count(a)'

we are incorrectly throwing an IllegalStateException. This is an
invalid query, and we are correctly handling it if "*" is replaced
by a specific column:

'select a from t order by count(b)'

which produces the error message

"ERROR: AnalysisException: select list expression not produced by
aggregation output (missing from GROUP BY clause?): a"

This patch fixes the handling of '*' in this context, so that the
error becomes

"ERROR: AnalysisException: select list expression not produced by
aggregation output (missing from GROUP BY clause?): *"

Note that the second changed line is required because
selectListItem.Expr_ is null when SelectListItem.isStar_ is true.

New FE unit tests are added for this use case.

Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
2 files changed, 29 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8143/2
--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1302/


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 15
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:00:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 15: Code-Review+2

Let me retry


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 15
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:00:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 14:

The build error seems to be:

> 19:44:25 ] [ERROR] 
> /home/ubuntu/Impala/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:[593,11]
>  no suitable method found for putString(java.lang.String)

This is https://issues.apache.org/jira/browse/IMPALA-6012. I believe Tim's 
pushed a workaround, so we could try again.


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 14
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:59:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] Remove dead code parallel-executor*

2017-10-04 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8206 )

Change subject: Remove dead code parallel-executor*
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8206/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8206/1//COMMIT_MSG@7
PS1, Line 7: Remove dead code parallel-executor*
Track it with a JIRA?



--
To view, visit http://gerrit.cloudera.org:8080/8206
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f0b121c2c40849937afa103dfedf0e0bef34477
Gerrit-Change-Number: 8206
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:37:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6016: Fix logging in TableLoadingMgr class

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8212


Change subject: IMPALA-6016: Fix logging in TableLoadingMgr class
..

IMPALA-6016: Fix logging in TableLoadingMgr class

This patch moves the logging of "loads in progress"
to a place where the current load is accounted. The
reason to move the logging is that the current load
is not reflected in the loadingTables_ till loadAsync()
is called.

Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 4 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/8212/1
--
To view, visit http://gerrit.cloudera.org:8080/8212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I925a6ba9a09be25df2759da5e6d85dfc8b981ce4
Gerrit-Change-Number: 8212
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

2017-10-04 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h@485
PS6, Line 485:   /// Tracks all the memory allocation used by dictionary in 
DictDecoder.
 :   struct DictMemTrack {
 : std::unique_ptr mem_tracker;
 :
 : DictMemTrack(MemTracker *parent_memtrack):mem_tracker(new 
MemTracker(-1,
 : "Decoder parquet dict", parent_memtrack, false)) { };
 :
 : MemTracker*  GetMemTracker() {
 :   return  mem_tracker.get();
 : }
 :
 : ~DictMemTrack() {
 :   mem_tracker->CloseAndUnregisterFromParent();
 : }
 :   };
I dont think we need to add a new struct for this, you can just call 
CloseAndUnregisterFromParent() on the dict_mem_tracker in 
HdfsParquetScanner::Close

Similarly for the writer


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@113
PS6, Line 113:  // Bytes used for memory tracking by dictionary in DictEncoder 
is decremented.
 :   virtual void Cleanup() {
 : if (dict_encoder_base_ != nullptr) {
 :   dict_encoder_base_->Cleanup();
 : }
 :   }
you can move this to the BaseColumnWriter::Close() method


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@767
PS6, Line 767:   // Below code will decrement the bytes used for memory 
tracking by dictionary in
 :   // DictEncoder class for each ColumnWriter.
 :   for (int i = 0; i < columns_.size(); i++) {
 :  if (columns_[i] != nullptr) {
 :columns_[i]->Cleanup();
 :  }
 :   }
similarly you can move this to the HdfsParquetTableWriter::Close() method


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@138
PS6, Line 138: SizeofDict
I think we can use a more precise name here, SizeOfDict might be confused with 
returning the number of elements in the Dictionary.
Something on the lines of DictByteSize?


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@438
PS6, Line 438: bytes_alloc += sizeof(value);
can you switch to what joe suggested earlier, that is, instead of bytes_alloc, 
using something like the function SizeofDict() that you wrote to update 
ConsumeBytes at the end.
Refer Joe's first comment.



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:29:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1301/


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 12: Code-Review+2

> It looks like it passed gerri-verify-external - 
> https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/

Great! I'll rebase and merge this.
Congratulations on getting your first patch in!


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 12
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 13: Code-Review+2

Rebase, Carry +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:17:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 4: Code-Review+2

carry


--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:16:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1300/


--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:17:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 7: Code-Review+1

Carry +1


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:15:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071
PS6, Line 1071:   if 
(PageContainsTupleData(current_page_header_.data_page_header.encoding)) {
> Does seem like the simplest solution. Might be worth adding a short comment
Expanded the comment just below to explain motivation.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:07:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Lars Volker, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8085

to look at the new patch set (#7).

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..

IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

This change only affects uncompressed plain-encoded Parquet where
RowBatches may directly reference strings stored in the I/O
buffers. The proposed fix is to simply copy the data pages if
needed then use the same logic that we use for decompressed data
pages.

This copy inevitably adds some CPU overhead, but I believe this is
acceptable because:
* We generally recommend using compression, and optimize for that
  case.
* Copying memory is cheaper than decompressing data.
* Scans of uncompressed data are very likely to be I/O bound.

This allows several major simplifications:
* The resource management for compressed and uncompressed
  scans is much more similar.
* We don't need to attach Disk I/O buffers to RowBatches.
* We don't need to deal with attaching I/O buffers in
  ScannerContext.
* Column readers can release each I/O buffer *before* advancing to
  the next one, making it easier to reason about resource
  consumption. E.g. each Parquet column only needs one I/O buffer at
  a time to make progress.

Future changes will apply to Avro, Sequence Files and Text. Once
all scanners are converted, ScannerContext::contains_tuple_data_
will always be false and we can remove some dead code.

Testing
===
Ran core ASAN and exhaustive debug builds.

Perf

No difference in most cases when scanning uncompressed parquet.
There is a significant regression (50% increase in runtime) in
targeted perf tests scanning non-dictionary-encoded strings (see
benchmark output below).  After the regression performance is
comparable to Snappy compression.

I also did a TPC-H run but ran into some issues with the report
generator. I manually compared times and there were no regressions.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
++---+-++++
| TARGETED-PERF(_61) | parquet / none / none | 23.02   | +0.60% | 4.23  
 | +5.97% |
++---+-++++

+++---++-++++-+---+
| Workload   | Query  | File Format   | Avg(s) | 
Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+++---++-++++-+---+
| TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00   | 
1.98| R +52.10%  |   0.97%|   1.25%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86   | 
1.92| R +49.11%  |   0.34%|   2.34%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16   | 
2.15| R +47.04%  |   1.03%|   0.72%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16   | 
2.17| R +45.60%  |   0.14%|   1.11%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51   | 
2.55| R +37.88%  |   0.83%|   0.49%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79   | 
0.61| R +30.86%  |   1.54%|   4.10%| 1   | 5 |
| TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45  | 
35.07   |   +12.51%  |   0.29%|   0.29%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78   | 
6.10|   +11.13%  |   0.99%|   0.74%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83   | 
8.14|   +8.52%   |   0.15%|   0.32%| 1   | 5 |
...

Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.h
4 files changed, 66 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/7
--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master

[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables. (wip)

2017-10-04 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8211 )

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. (wip)
..


Patch Set 1:

Hi Tim,

As you suggested, I'm sharing what I have so far for IMPALA-5243. This isn't 
ready for a full review, but I hope it gives a sense of the complexity.


--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 21:43:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables. (wip)

2017-10-04 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8211


Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. (wip)
..

IMPALA-5243: Speed up code gen for wide Avro tables. (wip)

HdfsAvroScanner::CodegenMaterializeTuple generates a function
linear in size to the number of columns. On 1000 column tables,
codegen time is significant.

This change breaks up MaterializeTuple() into multiple
smaller functions, and then calls them in order. When
breaking up into 200-column chunks, there is a noticeable
speed-up.

TODO:
* Run tests.
* Check by inspection that the base case of < 200 columns gets
  inlined into not having a helper function. Or, if it doesn't
  get inlined, convince LLVM to inline it. Or, skip the helper
  functions in that case.
* Compare query performance on a many-column table.
* Remove XXX blocks. There's a tiny bit of code marked XXX
  which I'm using for debugging and iteration. I've left it in
  in case folks have suggestions for how to better do the things
  I'm doing.

I measured codegen time for various "step sizes." The case where there are no 
helper
functions is about 2.6s. The best case was about a step size of 200,
with timings of 1.35s.

$(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try 
in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
10  CodeGen:(Total: 2s389ms, non-child: 2s389ms, % non-child: 100.00%) 
CodegenTime: 476.490us CompileTime: 1s260ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 133 NumInstructions: 9.01K 
OptimizationTime: 1s100ms PeakMemoryUsage: 4.40 MB PrepareTime: 10.489ms
10  CodeGen:(Total: 2s402ms, non-child: 2s402ms, % non-child: 100.00%) 
CodegenTime: 467.903us CompileTime: 1s264ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 133 NumInstructions: 9.01K 
OptimizationTime: 1s108ms PeakMemoryUsage: 4.40 MB PrepareTime: 10.183ms
10  CodeGen:(Total: 2s405ms, non-child: 2s405ms, % non-child: 100.00%) 
CodegenTime: 470.827us CompileTime: 1s266ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 133 NumInstructions: 9.01K 
OptimizationTime: 1s110ms PeakMemoryUsage: 4.40 MB PrepareTime: 10.378ms
10  CodeGen:(Total: 2s431ms, non-child: 2s431ms, % non-child: 100.00%) 
CodegenTime: 479.621us CompileTime: 1s284ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 133 NumInstructions: 9.01K 
OptimizationTime: 1s117ms PeakMemoryUsage: 4.40 MB PrepareTime: 10.355ms
10  CodeGen:(Total: 2s390ms, non-child: 2s390ms, % non-child: 100.00%) 
CodegenTime: 541.217us CompileTime: 1s260ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 133 NumInstructions: 9.01K 
OptimizationTime: 1s101ms PeakMemoryUsage: 4.40 MB PrepareTime: 10.186ms
50  CodeGen:(Total: 2s497ms, non-child: 2s497ms, % non-child: 100.00%) 
CodegenTime: 469.008us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 53 NumInstructions: 8.53K 
OptimizationTime: 1s200ms PeakMemoryUsage: 4.16 MB PrepareTime: 10.131ms
50  CodeGen:(Total: 2s468ms, non-child: 2s468ms, % non-child: 100.00%) 
CodegenTime: 479.658us CompileTime: 1s268ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 53 NumInstructions: 8.53K 
OptimizationTime: 1s170ms PeakMemoryUsage: 4.16 MB PrepareTime: 10.332ms
50  CodeGen:(Total: 2s475ms, non-child: 2s475ms, % non-child: 100.00%) 
CodegenTime: 488.243us CompileTime: 1s273ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 53 NumInstructions: 8.53K 
OptimizationTime: 1s173ms PeakMemoryUsage: 4.16 MB PrepareTime: 10.248ms
50  CodeGen:(Total: 2s488ms, non-child: 2s488ms, % non-child: 100.00%) 
CodegenTime: 492.705us CompileTime: 1s277ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 53 NumInstructions: 8.53K 
OptimizationTime: 1s181ms PeakMemoryUsage: 4.16 MB PrepareTime: 10.859ms
50  CodeGen:(Total: 2s456ms, non-child: 2s456ms, % non-child: 100.00%) 
CodegenTime: 466.180us CompileTime: 1s262ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 53 NumInstructions: 8.53K 
OptimizationTime: 1s165ms PeakMemoryUsage: 4.16 MB PrepareTime: 10.114ms
75  CodeGen:(Total: 2s875ms, non-child: 2s875ms, % non-child: 100.00%) 
CodegenTime: 474.302us CompileTime: 1s277ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 47 NumInstructions: 8.49K 
OptimizationTime: 1s568ms PeakMemoryUsage: 4.15 MB PrepareTime: 10.905ms
75  CodeGen:(Total: 2s856ms, non-child: 2s856ms, % non-child: 100.00%) 
CodegenTime: 510.086us CompileTime: 1s263ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 47 NumInstructions: 8.49K 
OptimizationTime: 1s563ms PeakMemoryUsage: 

[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8056 )

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
..


Patch Set 8: Code-Review+2

(1 comment)

Looks good once you've done a little more testing.

http://gerrit.cloudera.org:8080/#/c/8056/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8056/8//COMMIT_MSG@15
PS8, Line 15: Ran the query_test/test_scanners_fuzz.py in a loop (5 times)
I was thinking more like overnight (or a few hours). It's worth doing just to 
be sure that we won't break the build for anyone else.



--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 8
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 21:37:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8003 )

Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer 
pool
..


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml
File docs/topics/impala_buffer_pool_limit.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@39
PS4, Line 39:   Defines the size in bytes of an internal buffer for 
allocating memory during queries.
First sentence isn't accurate. Something like this might be better: "Defines a 
limit on the amount of memory that the query can allocate from the internal 
buffer pool."


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@53
PS4, Line 53: his default is represented by a value
:   of 0.
The default is actually that the query option is unset. 0 means a limit of 0. I 
guess that impala-shell reported that the default was 0, which is incorrect: 
see IMPALA-5589


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@69
PS4, Line 69: set buffer_pool_limit=8GB;
It can also be expressed as a percentage of mem_limit (e.g. 80%), which may be 
more useful in some cases.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml
File docs/topics/impala_default_spillable_buffer_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@44
PS4, Line 44: 
Do we document somewhere the supported byte suffixes (e.g. B, KB, MB, GB plus 
variations).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@49
PS4, Line 49: 
I think it's confusing to say that the default varies, since the query option 
value doesn't vary, it just sets an upper bound for the sizes that can be 
picked by the planner.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@50
PS4, Line 50:   65536 to 2097152, 
depending on the
Feel free to ignore, but might be easier to read if they were human-readable 
(e.g. 2MB).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@59
PS4, Line 59:   consider increasing the 
DEFAULT_SPILLABLE_BUFFER_SIZE setting.
Would it be helpful to elaborate here? E.g. "Larger buffer sizes will result in 
Impala issuing larger I/O operators to storage devices, which may result in 
higher throughput, particularly on rotational disks".


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@61
PS4, Line 61: 
It's implied but would it make sense to state the inverse. I.e. decreasing the 
option may reduce memory consumption.

This all depends on whether the planner has actually chosen a 
spillable-buffer_size that is capped by this query option (visible when 
explain_level=2). It seems worth mentioning that since it's a precondition to 
this having any effect.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml
File docs/topics/impala_max_row_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@47
PS4, Line 47: 524288
512K to be human-readable?


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@56
PS4, Line 56: accomodate
sp: accommodate


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@57
PS4, Line 57:   the total bytes stored in the largest row.
It's left ambiguous whether queries processing rows larger than MAX_ROW_SIZE 
*will* fail or whether they *may* fail. It's really the latter - in many case 
they will succeed. I've sometimes described it as best-effort processing for 
rows > max_row_size.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@68
PS4, Line 68: 

[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h
File be/src/runtime/mem-pool.h:

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h@122
PS3, Line 122:   uint8_t* TryAllocateUnaligned(int64_t size) noexcept {
> The default has been 8 byte aligned for as long as I'm aware.
Thanks, works for me. We can still reconsider the default later if necessary. 
We do make many small allocations in other places like CollectionValueBuilder 
(for collections with small tuples).



--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 21:15:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to bec2a24

2017-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8207 )

Change subject: Bump Kudu version to bec2a24
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
Gerrit-Change-Number: 8207
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Wed, 04 Oct 2017 20:57:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-10-04 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 12:

Tim,
Please remove all the unsupported queries from this code review as they are 
breaking the stress tests.
Consider adding a rewrite for the unsupported queries in a followup code review.


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Wed, 04 Oct 2017 20:57:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 2: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 20:51:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..

IMPALA-6012: workaround - downgrade hive temporarily

To unblock build, switch back to an older version of Hive
without HIVE-12730. This should be reverted once HIVE-17189
makes it into the dependencies.

Testing:
Ran test_ddl.py as a smoke test. Will do more testing concurrent with
review.

Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Reviewed-on: http://gerrit.cloudera.org:8080/8205
Reviewed-by: Tim Armstrong 
Tested-by: Tim Armstrong 
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 2:

I carried the verification since it was only a commit message change.


--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 20:51:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 20:49:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 14: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1299/


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 14
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:44:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h
File be/src/runtime/mem-pool.h:

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h@122
PS3, Line 122:   uint8_t* TryAllocateUnaligned(int64_t size) noexcept {
> Change lgtm. Mind explaining the original motivation behind the 8-byte defa
The default has been 8 byte aligned for as long as I'm aware.
I added the alignment parameter because there was a case where we needed 
16-byte alignment

I think 8-byte alignment is a reasonable default and matches what malloc() 
does. There's sometimes a perf benefit (although it's pretty limited on recent 
Intel processors). It might make sense to use unaligned memory in some other 
cases but it'd probably need perf validation.


http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h@123
PS3, Line 123: Allocate(size, 1);
> TryAllocateAligned(size, 1);
This was somewhat deliberate - I wanted to force inlining into this function so 
that the alignment logic could be optimised out. Added a comment - lmk if that 
addresses the comment.



--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:41:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8145

to look at the new patch set (#4).

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..

IMPALA-5988: optimise MemPool::TryAllocate()

Testing:
Ran core tests.

Perf:
Experiments using this on top of a WIP Avro patch for IMPALA-5307
showed noticable improvements in CPU efficiency - up to 10%

Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/runtime/mem-pool.h
6 files changed, 32 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8145/4
--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..


Patch Set 15: Code-Review+2

PS13 was the rebase (required to fix some compile errors) and PS14/15 are the 
changes in response to comments.


--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:27:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#14).

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..

IMPALA-5844: use a MemPool for expr result allocations

This is also a step towards IMPALA-2399 (remove QueryMaintenance()).

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 +  bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M 

[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#15).

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..

IMPALA-5844: use a MemPool for expr result allocations

This is also a step towards IMPALA-2399 (remove QueryMaintenance()).

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 +  bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M 

[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..


Patch Set 12:

(6 comments)

Rebase, then addressed comments.

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/exec-node.h@379
PS11, Line 379:   Status QueryMaintenance(RuntimeState* state) 
WARN_UNUSED_RESULT;
> Yup. Mind dropping a TODO: IMPALA-2399 remove this.
Done


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/partitioned-aggregation-node.cc@600
PS11, Line 600:
> Cool. Alternatively, if it were worth it, then maybe it'd be better for Cle
Clear() looks fairly cheap (it just iterates over the vector) and I 
don't think this class is super perf critical - it's only called to clean up 
partitions that weren't fully processed.


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/scanner-context.h@337
PS11, Line 337:   /// single-threaded scan node implementation.
> Once we can get rid of the multi-threaded scanner case, this seems worth re
IMPALA-6015


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/runtime/sorter.h@199
PS11, Line 199:   /// MemPool for allocating data structures used by expression 
evaluators in the sorter.
  :   MemPool expr_perm_pool_;
  :
  :   /// MemPool for allocations that hold results of expression 
evaluation in the sorter.
  :   /// Cleared periodically during sorting to prevent memory 
accumulating.
  :   MemPool expr_results_pool_;
  :
  :   /// In memory sorter and less-than comparator.
  :   TupleRowComparator compare_less_than_;
> Was also asking about the MemPool (e.g. in ExecNode we have indirection - i
I think the reasoning there is similar - the MemTracker* is a constructor 
parameter and the MemTrackers are not set up until Prepare(). Another advantage 
is that we can forward-declare MemPool instead of pulling in the full header.


http://gerrit.cloudera.org:8080/#/c/8025/12/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/8025/12/be/src/udf/udf.h@352
PS12, Line 352: Serialize()
> and Finalize()? or is it suppose to just say Serialize() is?
Done


http://gerrit.cloudera.org:8080/#/c/8025/12/be/src/udf/udf.h@356
PS12, Line 356: tringVal(FunctionContext*, int) ctor or 
StringVal::CopyFrom(FunctionContext*,
  : /// const uint8_t*, size_t).
StringVal::Resize() also should be listed here.



--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:24:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#13).

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..

IMPALA-5844: use a MemPool for expr result allocations

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 +  bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M 

[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/13/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/7438/13/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@1472
PS13, Line 1472: Type.DOUBLE, ScalarType.createDecimalType(2, 1));
> actually, are there any other cases that we should add here  now that we've
I think the decimal output type parameters are orthogonal to the purpose of 
this specific test, which is testing whether the output is resolved to 
*INT/DECIMAL/DOUBLE with various combinations of input types and 
literals/non-literals. I don't think Taras' change would have affected that.



--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 13
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 04 Oct 2017 19:02:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8196 )

Change subject: IMPALA-4236: Codegen CopyRows() for select nodes
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc
File be/src/exec/select-node.cc:

http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc@113
PS4, Line 113: copy_rows_result;
I think it would still be better to have a name that makes the below control 
flow more understandable, if we can come up with one. It is unnecessarily 
confusing right now.

After looking at this again with the benefit of some time, I think having 
CopyRows() returning a bool is unnecessarily confusing because it's telling the 
caller about the current state in a non-obvious way, yet the caller can check 
the state directly.


http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc@120
PS4, Line 120: if (copy_rows_result) {
I think there's an equivalent but simpler way of expressing the control flow 
along the lines of:

  *eos = ReachedLimit() || (child_row_idx_ == child_row_batch->num_row() && 
child_eos_);
  if (*eos) {
... do the eos things ...
return Status::OK();
  }
  if (row_batch->AtCapacity()) return Status::OK();

I don't think your change made this more confusing but I think it would be good 
to fix it so that the control flow is more obviously correct.



--
To view, visit http://gerrit.cloudera.org:8080/8196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e
Gerrit-Change-Number: 8196
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:58:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/3/docs/topics/impala_alter_table.xml@1014
PS3, Line 1014: or compacted
This is not correct. Only storage attribute changes get applied to compacted 
rows, not default value changes.



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:38:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump Kudu version to bec2a24

2017-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8207


Change subject: Bump Kudu version to bec2a24
..

Bump Kudu version to bec2a24

Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8207/1
--
To view, visit http://gerrit.cloudera.org:8080/8207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaeafb27412892112f59f10a70f57eb2275e02fd5
Gerrit-Change-Number: 8207
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

2017-10-04 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use 
TAcceptQueueServer
..


Patch Set 12:

It looks like it passed gerri-verify-external - 
https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/


--
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 12
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:13:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue

2017-10-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8143 )

Change subject: IMPALA-4682 Fix IllegalStateException issue
..


Patch Set 1:

Zoram, let's try to move this forward. We don't want too many open CRs.


--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:12:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

2017-10-04 Thread Bikramjeet Vig (Code Review)
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7822

to look at the new patch set (#4).

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.

Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.

Performance:
Initial perf testing using tpcds_1000 shows no regression.

Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
17 files changed, 622 insertions(+), 259 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/4
--
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 8:

(3 comments)

Thanks! That looks much better.

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@734
PS8, Line 734: if (options == null) {
 :   options = defaultQueryOptions();
 : } else {
 :   options = mergeQueryOptions(defaultQueryOptions(), 
options);
 : }
Move above L732 and make options an input parameter to TestFileParser 
constructor.


http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@26
PS8, Line 26: import java.lang.reflect.Method;
Is this needed?


http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@355
PS8, Line 355: private void parseOneQueryOption(String line, TestCase testCase) 
{
Why parsing one query option at a time? ParseQueryOptions() simply expects a 
comma-separated list of options and parses all of them. Wouldn't be simpler to 
make a JNI call to ParseQueryOptions() instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:09:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-04 Thread John Russell (Code Review)
Hello Greg Rahn, Thomas Tauber-Marshall, Ambreen Kazi, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8191

to look at the new patch set (#3).

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..

IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

ALTER TABLE syntax for changing Kudu column default
and storage attributes.

Also SET COMMENT for non-Kudu tables.

Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
---
M docs/topics/impala_alter_table.xml
1 file changed, 68 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/8191/3
--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 3
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax

2017-10-04 Thread John Russell (Code Review)
John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8191 )

Change subject: IMPALA-4622: [DOCS] New Kudu ALTER TABLE syntax
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@67
PS2, Line 67: ]
> extraneous?
Done


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@69
PS2, Line 69: COMMENT 'comment_text'
> This does parse, but it will always give an error because Kudu doesn't supp
Done. Let's take out this COMMENT part because I added lines 73-75 for SET 
COMMENT for non-Kudu tables.


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@71
PS2, Line 71: {
> missing the closing '}'
Done


http://gerrit.cloudera.org:8080/#/c/8191/2/docs/topics/impala_alter_table.xml@1038
PS2, Line 1038: any newly inserted rows
> These attributes may in some cases end up applied to existing rows as Kudu
Done



--
To view, visit http://gerrit.cloudera.org:8080/8191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7647dfe8acfdb598caf561d41d74e38a8b66ce9d
Gerrit-Change-Number: 8191
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:02:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 1:

(2 comments)

It does seem like picking a reasonable timeout here is not possible. 
Additionally, given that the current behavior is that all queries will fail 
before isReady(), waiting indefinitely for isReady() seems okay.

http://gerrit.cloudera.org:8080/#/c/8202/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/1/be/src/service/impala-server.cc@2043
PS1, Line 2043: FLAGS_is_coordinator
the relationship between this code and the HS2/Beeswax servers would be clearer 
if this were:

if (hs2_server_.get() != NULL || beeswax_server_.get() != NULL)

or if we just call WaitForCatalog() on both branches.


http://gerrit.cloudera.org:8080/#/c/8202/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/1/fe/src/main/java/org/apache/impala/service/Frontend.java@905
PS1, Line 905: if (!impaladCatalog_.get().isReady()) {
perhaps this should be a Preconditions check, if our intention is to prevent 
this from being callable before isReady() returns true.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:39:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG@11
PS1, Line 11:
> May be add that this should be reverted once HIVE-17189 makes it to the dep
Done



--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:32:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Hello Bharath Vissapragada, Dimitris Tsirogiannis, Alex Behm, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8205

to look at the new patch set (#2).

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..

IMPALA-6012: workaround - downgrade hive temporarily

To unblock build, switch back to an older version of Hive
without HIVE-12730. This should be reverted once HIVE-17189
makes it into the dependencies.

Testing:
Ran test_ddl.py as a smoke test. Will do more testing concurrent with
review.

Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8205/2
--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Remove dead code parallel-executor*

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8206


Change subject: Remove dead code parallel-executor*
..

Remove dead code parallel-executor*

Change-Id: I7f0b121c2c40849937afa103dfedf0e0bef34477
---
M be/src/runtime/CMakeLists.txt
D be/src/runtime/parallel-executor-test.cc
D be/src/runtime/parallel-executor.cc
D be/src/runtime/parallel-executor.h
4 files changed, 0 insertions(+), 217 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/8206/1
--
To view, visit http://gerrit.cloudera.org:8080/8206
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f0b121c2c40849937afa103dfedf0e0bef34477
Gerrit-Change-Number: 8206
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 1:

What process is currently monitoring for impalad health and under what 
conditions is it restarted? Depending on that watchdog process, a quicker death 
may be easier to deal with since it would be a clear signal to restart the 
process.


--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:22:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5988: optimise MemPool::TryAllocate()

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8145 )

Change subject: IMPALA-5988: optimise MemPool::TryAllocate()
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h
File be/src/runtime/mem-pool.h:

http://gerrit.cloudera.org:8080/#/c/8145/3/be/src/runtime/mem-pool.h@123
PS3, Line 123: Allocate(size, 1);
TryAllocateAligned(size, 1);
seems a bit clearer to me (then it's clearer this is just syntactic sugar over 
the core API).



--
To view, visit http://gerrit.cloudera.org:8080/8145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I088012084fe535a67a3cd1103ced5a02027f961a
Gerrit-Change-Number: 8145
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:18:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8100 )

Change subject: IMPALA-5940: Avoid log spew by using Status::Expected.
..


Patch Set 5: Code-Review+1

(1 comment)

Please get have Michael do the +2 once the RPC logging stuff is settled (I'm 
not sure if he wants something done or not).

http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h
File be/src/runtime/client-cache.h:

http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h@246
PS3, Line 246: VLOG(1) << err;
> I agree that removing the backtrace here is the right direction as 90% of t
Maybe we should include the RPC name in the log message and that will 
disambiguate?



--
To view, visit http://gerrit.cloudera.org:8080/8100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330
Gerrit-Change-Number: 8100
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:12:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 14
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:09:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1299/


--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 14
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:09:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Balazs Jeszenszky (Code Review)
Balazs Jeszenszky has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 1:

I think the bug report's title is a better than it's description. In 
particular, I'm not sure shutting down the daemon is the best way out of 
situations like this since in most managed environments they will be restarted 
automatically (causing another attempt to get initial catalog update, same as 
if they were left alive).
If for some reason opening the ports later is not possible, I think timing out 
should result in closing the connection with a more descriptive error message 
so that clients can retry.

However, the best solution would be to only open the frontend ports after the 
catalog update is processed. With a load balancer in front of several 
coordinators, this setup would allow for a seamless (i.e. not even a delay due 
to connections waiting for the catalog update to be processed) experience for 
clients.


--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 04 Oct 2017 17:04:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8205/1//COMMIT_MSG@11
PS1, Line 11:
May be add that this should be reverted once HIVE-17189 makes it to the 
dependencies?



--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:57:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7438 )

Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7438/13/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/7438/13/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@1472
PS13, Line 1472: Type.DOUBLE, ScalarType.createDecimalType(2, 1));
actually, are there any other cases that we should add here  now that we've 
changed decimal_v2 multiplication rule?



--
To view, visit http://gerrit.cloudera.org:8080/7438
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1
Gerrit-Change-Number: 7438
Gerrit-PatchSet: 13
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:50:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1298/


--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:48:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8205 )

Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..


Patch Set 1: Code-Review-2

Running tests but don't want to merge yet.


--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:48:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6012: workaround - downgrade hive temporarily

2017-10-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8205


Change subject: IMPALA-6012: workaround - downgrade hive temporarily
..

IMPALA-6012: workaround - downgrade hive temporarily

To unblock build, switch back to an older version of Hive
without HIVE-12730.

Testing:
Ran test_ddl.py as a smoke test. Will do more testing concurrent with
review.

Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
---
M bin/impala-config.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8205/1
--
To view, visit http://gerrit.cloudera.org:8080/8205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I71cd293df1fc84429b23c94ad5c0564aa627a9a8
Gerrit-Change-Number: 8205
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-10-04 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..


Patch Set 12: Code-Review+2

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/exec-node.h@379
PS11, Line 379:   Status QueryMaintenance(RuntimeState* state) 
WARN_UNUSED_RESULT;
> Yeah I think this is another step towards IMPALA-2399 since there's less en
Yup. Mind dropping a TODO: IMPALA-2399 remove this.
in here to remind people reading the code that this is not the way we want to 
do things in the future?

Also, you could list IMPALA-2399 in the commit message since this is a very 
nice step toward that.


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/partitioned-aggregation-node.cc@600
PS11, Line 600:
> Probably not
Cool. Alternatively, if it were worth it, then maybe it'd be better for Clear() 
(or a wrapper) to do a quick first check e.g. like check total_allocated_bytes_ 
and bail if that's under some threshold.  But if Clear() is cheap enough to 
just run through, let's do that.


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/exec/scanner-context.h@337
PS11, Line 337:   /// single-threaded scan node implementation.
> Mentioned the two cases, since the only reason this is this complicated is
Once we can get rid of the multi-threaded scanner case, this seems worth 
revisiting. Do you agree? If so, I think we should add a subtask about it to 
the overall MT jira.


http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/8025/11/be/src/runtime/sorter.h@199
PS11, Line 199:   /// MemPool for allocating data structures used by expression 
evaluators in the sorter.
  :   MemPool expr_perm_pool_;
  :
  :   /// MemPool for allocations that hold results of expression 
evaluation in the sorter.
  :   /// Cleared periodically during sorting to prevent memory 
accumulating.
  :   MemPool expr_results_pool_;
  :
  :   /// In memory sorter and less-than comparator.
  :   TupleRowComparator compare_less_than_;
> In TopNNode and ExchangeNode the comparator can't be instantiated until Pre
Was also asking about the MemPool (e.g. in ExecNode we have indirection - is 
that actually needed?).


http://gerrit.cloudera.org:8080/#/c/8025/12/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/8025/12/be/src/udf/udf.h@352
PS12, Line 352: Serialize()
and Finalize()? or is it suppose to just say Serialize() is?



--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:40:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8202/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8202/1//COMMIT_MSG@17
PS1, Line 17: this wait time bound, the impalad exits.
> Because it's hard to time the start-up of processes on multiple machines, s
the bug description recommends that the impalad die after a "reasonable" amount 
of time/retries. I can adjust the setting that's currently in this patch-- I 
figured that would be a point of discussion.



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:39:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-04 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8202/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8202/1//COMMIT_MSG@17
PS1, Line 17: this wait time bound, the impalad exits.
Because it's hard to time the start-up of processes on multiple machines, short 
timeouts like this one (it's 10 seconds, if I'm reading correctly) can cause 
issues. Most Hadoop-related distributed systems (but not all) wait forever or 
~hours before they decide to give up.

I'm also under the impression that really large catalogs can take a long time 
to start-up. Do you know if that would trigger?

All that said, I'm simply arguing that it's reasonable to make the number of 
retries infinite here.

I look forward to what others will add!



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 04 Oct 2017 16:29:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-10-04 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7954

to look at the new patch set (#11).

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..

IMPALA-5664: Unix time to timestamp conversions may crash Impala

TimestampValue::FromSubsecondUnixTime() and UtcFromUnixTimeMicros()
are incorrect only in case of the last second of 1399, because these
sub-second values are rounded first towards 1400-01-01 00:00:00, which
is accepted as a valid date, and the sub-second part is subtracted
afterwards, leading to a date outside the valid interval. The maximum
case, -12-31 59:59:59 is a bit different, because as I understand,
with nanosecond precision posix times, the maximum value is actually
1-01-01. 00:00:00 minus 1 nanosec.

TimestampValue::FromUnixTimeNanos() can create problematic TimestampValues
both <1400 and 1<=.

These timestamps can cause problems, because most code assumes that if
HasDate/HasTime is true, then it really is a valid timestamp.

To fix this, the posix times are checked in the constructor of
TimestampValue, and if it is outside the valid interval,both time_
and date_ are set to not_a_date_time.

Also added logging to "cast to timestamp" functions.

Test:
select cast(-17987443200-0.1 as timestamp);
This query no longer crashes, but returns NULL, similarly to other
< 1400 timestamps.

Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.inline.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
7 files changed, 66 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7954/11
--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-10-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7954 )

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 10:

The fe/ files should not be in the patch, I just needed them to fix a compile 
error.


--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 04 Oct 2017 15:15:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-10-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7954 )

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@136
PS9, Line 136: if(!tv.HasDateAndTime()){
> here an elsewhere - please follow the style used by the rest of impala.
Done


http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@137
PS9, Line 137: UnixMicrosToUtcTimestamp
> is that meaningful to the user?
I have changed it to its query name.


http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@139
PS9, Line 139:   }
> are the ones you added the only places where your new validation might prod
Literal::Literal(ColumnType type, double v) also uses 
TimestampValue::FromSubsecondUnixTime, but there is no function context there, 
and I do not know how to call that constructor. If I put a double to a place 
where a timestamp is expected, I get an analyses exception.

I have also noticed someting strange:
select timestamp_cmp(timstamp_col, cast(cast(-17987443200.1 as double) as 
timestamp)) from table;
The table has 3 rows and 4 "UDF WARNING: Could not convert -17987443200.1 to 
timestamp" are printed.

It would make more sense to me to have 1 call to cast (if the casts with 
constant arguments would be optimized away) or 3 (if no optimization takes 
place, and the cast is called for every row).

Does Impala try to optimize functions calls with constant parameters?


http://gerrit.cloudera.org:8080/#/c/7954/9/be/src/exprs/timestamp-functions-ir.cc@751
PS9, Line 751: datetime.date().year();
> it still seems like this is doing the same validation. should we remove thi
It is the same check, but I would prefer to leave it as it is for now - it 
leads to a nice warning message, and the try-catch block is needed anyway 
because of AddInterval. It could be done in another commit + jira like clean 
up/speed up timestamp functions.



--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 04 Oct 2017 15:09:06 +
Gerrit-HasComments: Yes


  1   2   >