[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 04:27:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-18 Thread Kim Jin Chul (Code Review)
Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8327 )

Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..


Patch Set 1: Code-Review-1

I think this check seems not to be enough. Let me leave a detailed comment on 
the JIRA ticket.


--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Thu, 19 Oct 2017 04:13:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8327 )

Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..


Patch Set 1:

(3 comments)

Thanks for fixing, lgtm, only minor nits

http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc
File be/src/common/init.cc:

http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@178
PS1, Line 178: char *classpath = getenv("CLASSPATH");
style: char*


http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@179
PS1, Line 179: assert(classpath != NULL);
Why not DCHECK and add a message


http://gerrit.cloudera.org:8080/#/c/8327/1/be/src/common/init.cc@182
PS1, Line 182: assert(std::string(classpath).find("*") == 
std::string::npos);
DCHECK and include an error message



--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:45:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958
PS7, Line 1958:   Lists.partition(allHmsPartitionsToAdd, 
MAX_PARTITION_UPDATES_PER_RPC)) {
While you are here, can you also change bulkAlterPartitions() to use 
Lists.partition()?


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514
PS7, Line 514:   def test_alter_table_create_many_partitions(self, vector, 
unique_database):
This test doesn't belong in the TestLibCache class.

Don't we have existing tests that test the multi-partition add alter? Seems 
like this test belongs there.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519
PS7, Line 519: self.client.execute('use {0}'.format(unique_database))
"use" can generally cause problems in tests because we sometimes run a test 
against different impalads for example to exercise sync_ddl. While I think it 
is ok in this specific instance, it's generally better practice not so issue 
"use" statements unless absolutely necessary.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527
PS7, Line 527: # 473 -1  0   0B  NOT CACHED  NOT CACHED 
 TEXTfalse   
hdfs://localhost:20500/test-warehouse/test_alter_table_create_many_partitions_5b4888f5.db/t/p=473
This comment can become stale quickly. I suggest not adding it an output row 
verbatim but mentioning that show partitions contains the partition HDFS paths 
which we expect to contain p=val sub-directories that you are going to search 
for with the regex.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:40:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-10-18 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8305

to look at the new patch set (#3).

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..

IMPALA-5599: Clean up references to TimestampValue in be/src.

This is a follow-on commit to d53f43b4. In this patch we replace all
inappropriate usage of TimestampValue in be/src/service and
be/src/statetore with simpler data types for Unix timestamps, and
where conversions to strings are needed, we use the interfaces
introduced by the previous commit.

BE, FE, EE, JDBC, and Cluster Tests were run to check for
regressions. All the tests passed.

Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/statestore/statestore-subscriber.cc
6 files changed, 29 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8305/3
--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 3
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-10-18 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..


Patch Set 2:

> (1 comment)
 >
 > Code change looks good.
 >
 > How did you test it? Please add a Testing section to commit message
 > indicating that. Please at least manually inspect each affected
 > field by e.g. looking at the profile.

Thanks for the suggestion to look at the profiles. Here's an example, and it 
looks fine to me:

Query (id=394caf0e5f233559:8d02422c):
  DEBUG MODE WARNING: Query profile created while running a DEBUG build of 
Impala. Use RELEASE builds to measure query performance.
  Summary:
Session ID: 864fbccb3e7bee09:7fa78fb245fa3688
Session Type: BEESWAX
Start Time: 2017-10-18 20:21:59.972644
End Time: 2017-10-18 20:22:00.097281
Query Type: N/A
Query State: EXCEPTION
Query Status: AnalysisException: Could not resolve table reference: 
'tpch_avro.lineitem'

Impala Version: impalad version 2.11.0-SNAPSHOT DEBUG (build 
979c3dfd6fc40dbfb413cf45a9560f81495d1f5a)
User: zoram
Connected User: zoram
Delegated User:
Network Address: 
Default Db: default
Sql Statement: select l_orderkey from tpch_avro.lineitem having 
count(l_orderkey) > 1
Coordinator: impala-1:22000
Query Timeline: 124.796ms
   - Query submitted: 45.687us (45.687us)
   - Unregister query: 124.627ms (124.582ms)


--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:32:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow

2017-10-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8329


Change subject: IMPALA-4964: Fix Decimal modulo overflow
..

IMPALA-4964: Fix Decimal modulo overflow

The modulo operation between two decimals should never overflow. Before
this patch, there would be an overflow if the scale difference between
the two decimals was large. We would try to scale up the one with the
smaller scale, so that the scales matched, which could result in an
overflow.

We fix the problem by first checking if the scaled up value would fit
into 128 bits by estimating the number of leading zeros in it. If we
detect that 128 bits is not enough, we convert both numbers to int256,
do the operation, then convert back to 128 bits.

Testing:
- Added some expr tests that excercise the new code path.

Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.inline.h
3 files changed, 116 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8329/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8329
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd
Gerrit-Change-Number: 8329
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8327


Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH.
..

IMPALA-6073: Fail on misconfigured CLASSPATH.

Asserts, early, that $CLASSPATH is set and has no wildcards.

Several developers have been tripped up running C++ tests without first
running 'bin/set-classpath.sh'. This causes them to run into HDFS-11851
(getGlobalJNIEnv() may deadlock on exception), wherein, instead of a
relatively clear "Class not found" error, we hang forever.

I considered limiting this to tests, but we clearly need a $CLASSPATH
for the daemons themselves.

Testing:
* Ran a backend test without $CLASSPATH set.
* Started Impala cluster with this change.

Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
---
M be/src/common/init.cc
1 file changed, 7 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/8327/1
--
To view, visit http://gerrit.cloudera.org:8080/8327
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d
Gerrit-Change-Number: 8327
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 4
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 01:45:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] Allow the SASL protocol service name to be configurable

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8230 )

Change subject: Allow the SASL protocol service name to be configurable
..

Allow the SASL protocol service name to be configurable

Previously the SASL service name was always set to a constant "kudu"
which was tracked by kSaslProtoName in rpc/constants.h.

However, for applications that use the KRPC library that would prefer
to do their own SASL initialization, they would requre to set their own
SASL service name to be passed into sasl_server_new()/sasl_client_new().

This patch allows for this configuration by adding a configurable
parameter to the MessengerBuilder which is ultimately passed down to the
negotiation layer.

Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Reviewed-on: http://gerrit.cloudera.org:8080/8218
Reviewed-by: Dan Burkert 
Tested-by: Dan Burkert 
Reviewed-on: http://gerrit.cloudera.org:8080/8230
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M be/src/kudu/rpc/client_negotiation.cc
M be/src/kudu/rpc/client_negotiation.h
M be/src/kudu/rpc/constants.cc
M be/src/kudu/rpc/constants.h
M be/src/kudu/rpc/messenger.cc
M be/src/kudu/rpc/messenger.h
M be/src/kudu/rpc/negotiation-test.cc
M be/src/kudu/rpc/negotiation.cc
M be/src/kudu/rpc/server_negotiation.cc
M be/src/kudu/rpc/server_negotiation.h
10 files changed, 59 insertions(+), 28 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Gerrit-Change-Number: 8230
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] Allow the SASL protocol service name to be configurable

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8230 )

Change subject: Allow the SASL protocol service name to be configurable
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Gerrit-Change-Number: 8230
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 19 Oct 2017 01:23:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition

2017-10-18 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8309 )

Change subject: IMPALA-5019: Decimal V2 addition
..

IMPALA-5019: Decimal V2 addition

In this patch, we implement the new decimal return type rules for
addition expressions. These rules become active when the query option
DECIMAL_V2 is enabled. The algorithm for determining the type of the
result is described in the JIRA.

DECIMAL V1:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,38) |
++

DECIMAL V2:
++
| typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) |
++
| DECIMAL(38,6)  |
++

This patch required backend changes. We implement an algorithm where
we handle the whole and fractional parts separately, and then combine
them to get the final result. This is more complex and slower. We try
to avoid this by first checking if the result would into int128.

Testing:
- Added expr tests.
- Tested locally on my machine with a script that generates random
  decimal numbers and checks that Impala adds them correctly.

Performance:

For the common case, performance remains the same.
  select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1)

  BEFORE: 4.74s
  AFTER:  4.73s

In this case, we check if it is necessary to do the complex addition,
and it turns out to be not necessary. We see a slowdown because the
result needs to be scaled down by dividing.
  select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19)

  BEFORE: 1.63s
  AFTER:  13.57s

In following case, we take the most complex path and see the most
signification perfmance hit.
  select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37))

  BEFORE: 1.63s
  AFTER: 20.57

Change-Id: I401049c56d910eb1546a178c909c923b01239336
---
M be/src/exprs/expr-test.cc
M be/src/runtime/decimal-value.inline.h
M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
4 files changed, 297 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8309/2
--
To view, visit http://gerrit.cloudera.org:8080/8309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336
Gerrit-Change-Number: 8309
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-18 Thread Vuk Ercegovac (Code Review)
Hello Philip Zeyliger, Balazs Jeszenszky, Dimitris Tsirogiannis, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8202

to look at the new patch set (#9).

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..

IMPALA-4704: Turns on client connections when local catalog initialized.

Currently, impalad starts beeswax and hs2 servers even if the
catalog has not yet been initialized. As a result, client
connections see an error message stating that the impalad
is not yet ready.

This patch changes the impalad startup sequence to wait
until the catalog is received before opening beeswax and hs2 ports
and starting their servers.

Testing:
- python e2e tests that start a cluster without a catalog
  and check that client connections are rejected as expected.

Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
---
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-server.cc
M be/src/testutil/in-process-servers.cc
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_cluster.py
M tests/common/impala_service.py
A tests/custom_cluster/test_catalog_wait.py
M tests/custom_cluster/test_coordinators.py
13 files changed, 166 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/8202/9
--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> People like Alex are those whom I was most concerned about, as I know he us
:). I'm still using that good-old machine, mem should be fine (fingers crossed).



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:47:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1350/


--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:27:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> I'd prefer to keep this change. Our Hive server tends to OOM pretty easily
People like Alex are those whom I was most concerned about, as I know he used 
to develop Impala on a machine without much memory.

If Alex is OK with this, I am, too.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:09:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(2 comments)

Changes like these tend to be slow and painful to test, so I'm in favor of not 
parallelizing additional things in this patch. Additional steps can be improved 
later.

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33
PS1, Line 33:
What testing did you do? Does the data load still run on a non-beefy local 
machine?


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> This looks like it will also increase HADOOP_HEAPSIZE when not doing a para
I'd prefer to keep this change. Our Hive server tends to OOM pretty easily when 
doing anything non-trivial with Hive on our mini cluster.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:07:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8317 )

Change subject: IMPALA-5976: Remove equivalent class computation in FE
..


Patch Set 1:

(47 comments)

Very nice.

First wave of comments. I have not exhaustively gone through everything yet 
since this is a very substantial patch.

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8317/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-5976: Remove equivalent class computation in FE
Remove equivalence classes from FE


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2017
PS1, Line 2017:
Very satisfying to see this code deleted!


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a2890
PS1, Line 2890:
This function was nice for debugging. Does your patch add a similar function 
that nicely prints the two graphs?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1253
PS1, Line 1253: return !tids.isEmpty() && (tids.size() > 1 || 
isOjConjunct(e) || isFullOuterJoined(e)
This new formatting is extremely hard to read, please reformat for readability. 
The old formatting was much better.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1496
PS1, Line 1496: for (SlotId dst : getEquivSlots(srcSid)) {
Unless we are absolutely convinced that the change is correct, I think we 
should try to preserve the semantics of the original code. My understanding is 
that the new getEquivSlots() returns slots that have a mutual value transfer 
with srcSid - which is different than what the original code did.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1608
PS1, Line 1608:* For each equivalence class, adds/removes predicates from 
conjuncts such that it
Refers to "equivalence class" again which is now an undefined term. If you want 
to use that term we should define what it means.

Let's think about how to evolve the terminology. Based on my understanding of 
your patch it looks like you want to redefine "equivalence class" to be a set 
of slots where each pair of slots has a mutual value transfer.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1626
PS1, Line 1626: // Equivalence classes only containing slots belonging to 
lhsTids.
What does this mean? Is the Integer key the SCC id?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1802
PS1, Line 1802: if (slotDesc.getId().asInt() >= g.numVertices()) 
continue;
I've seen this check in a couple of places, would be nice if that could be 
handled by 'g' instead of all callsites.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1952
PS1, Line 1952:   public void computeValueTransfer() {
computeValueTransferGraph


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1972
PS1, Line 1972:   private void constructValueTransferFromEqPredicates(Graph g) {
valueTransfers (plural)


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2047
PS1, Line 2047:* Returns the ids of slots that are in the same equivalence 
class as slotId
Comment need updating. The concept of "equivalence class" does not exist 
anymore and we should not use that term.


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2055
PS1, Line 2055: for (int dst: g.getSccMembers(g.getSccId(slotId.asInt( {
In terms of the g API I think callers should have to care about the existence 
of SCCs. Can we hide those details?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2070
PS1, Line 2070: Collections.sort(result);
Shouldn't the destinations already be sorted?


http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/8317/1/fe/src/main/java/org/apache/impala/analysis/Expr.java@702
PS1, Line 702: if (!localCmp.apply(Pair.create(this, that))) return false;
It's very unfortunate that we have to create a new pair object to do a SlotRef 
comparison. We should try to avoid that generate an excessive number of 
objects. Expr.equals() really is called quite frequently and SlotRef is the 
most common Expr.

Using a Java Comparator is not ideal because we don't provide 

[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resourcesa

2017-10-18 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8323

to look at the new patch set (#2).

Change subject: IMPALA-1575: part 2: yield admission control resourcesa
..

IMPALA-1575: part 2: yield admission control resourcesa

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-state.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/service/client-request-state.cc
M tests/custom_cluster/test_admission_controller.py
9 files changed, 185 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8323/2
--
To view, visit http://gerrit.cloudera.org:8080/8323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Gerrit-Change-Number: 8323
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492
PS1, Line 492:   run-step "Loading auxiliary workloads" load-aux-workloads.log 
load-aux-workloads
I don't see any reason this also couldn't run in parallel.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:44:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-10-18 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..

IMPALA-1422: support a constant on LHS of IN predicates.

Currently, constant expressions for the LHS of the IN predicate
are not supported. This patch adds this support as a rewrite in
StmtRewriter (where subqueries are rewritten to joins). Since
there is a nested-loop variant of left semijoin, support for IN
is handled by not erring out. NOT IN is handled by a rewrite to
corresponding NOT EXISTS predicate.

Re-organized the frontend subquery analysis tests to expand coverage.

Testing:
- added frontend subquery analysis tests
- added e2e tests

Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
5 files changed, 479 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8322/2
--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resourcesa

2017-10-18 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8323

to look at the new patch set (#3).

Change subject: IMPALA-1575: part 2: yield admission control resourcesa
..

IMPALA-1575: part 2: yield admission control resourcesa

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted to a distributed system, not just the
coordinator.  The comment for ReleaseAdmissionControlResources()
documents the preconditions and rationale. Note that the preconditions
are not weaker than the preconditions of calling UnregisterQuery()
before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior. This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-state.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/service/client-request-state.cc
M tests/custom_cluster/test_admission_controller.py
9 files changed, 192 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8323/3
--
To view, visit http://gerrit.cloudera.org:8080/8323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Gerrit-Change-Number: 8323
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..

IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

Currently if llvm encounters an error it calls exit() and this kills
the impalad process. This patch adds a diagnostic handler that llvm
can use to pass the error up to impala instead of crashing it.

Testing:
Added a test which induces an error in llvm and makes sure that its
passed up to impala code and handled correctly.

Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Reviewed-on: http://gerrit.cloudera.org:8080/8233
Reviewed-by: Bikramjeet Vig 
Tested-by: Impala Public Jenkins
---
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/testutil/gtest-util.h
M be/src/util/filesystem-util.cc
M be/src/util/filesystem-util.h
6 files changed, 93 insertions(+), 0 deletions(-)

Approvals:
  Bikramjeet Vig: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 8
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-2758: Change BufferedTupleStream::GetRows to returning multi batches

2017-10-18 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8226 )

Change subject: IMPALA-2758: Change BufferedTupleStream::GetRows to returning 
multi batches
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h@332
PS1, Line 332:   Status GetNext(
> My intent was to remove the GetRows() API altogether and switch callers to
Do you mean to keep only one rowbatch and assemble the row batches multiple 
times but yet let the stream stay pinned?



--
To view, visit http://gerrit.cloudera.org:8080/8226
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Gerrit-Change-Number: 8226
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:36:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..


Patch Set 2: Code-Review+1

(1 comment)

Code change looks good.

How did you test it? Please add a Testing section to commit message indicating 
that. Please at least manually inspect each affected field by e.g. looking at 
the profile.

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc@303
PS1, Line 303:   int64_t duration_us = record.end_time_us - 
record.start_time_us;
> Please see my response to DanH's comment...
Using monotonic clock for durations is generally the right thing to do, but 
given we also print the start and end times in this case, let's leave it alone 
since the Print() function handles it in a reasonable way,



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:35:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resourcesa

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8323


Change subject: IMPALA-1575: part 2: yield admission control resourcesa
..

IMPALA-1575: part 2: yield admission control resourcesa

This change releases admission control resources more eagerly,
once the query has finished actively executing. Some resources
(tracked and untracked) are still consumed by the client request
as long as it remains open, e.g. memory for control structures
and the result cache. However, these resources are relatively
small and should not block admission of new queries.

The same as in part 1, query execution is considered to be finished
under any of the following conditions:
1. The query encounters an error and fails
2. The query is cancelled due to the idle query timeout
3. The query reaches eos (or the DML completes)
4. The client cancels the query without closing the query

Admission control resources are released in two ways:
1. by calling AdmissionController::ReleaseQuery() on the coordinator
   promptly after query execution finishes, instead of waiting for
   UnregisterQuery(). This means that the query and its memory is
   no longer considered "admitted".
2. by changing the behaviour of MemTracker::GetPoolMemReserved() so
   that it is aware of when a query has finished executing and does not
   consider its entire memory limit to be "reserved".

The preconditions for releasing an admitted query are subtle because the
queries are being admitted a distributed system, not just the coordinator.
The comment for ReleaseAdmissionControlResources() documents the
preconditions and rationale. Note that the preconditions are not weaker
than the preconditions of calling UnregisterQuery() before this patch.

Testing:
TestAdmissionController is extended to end queries in four ways:
cancellation by client, idle timeout, the last row being fetched,
and the client closing the query. The test uses a mix of all four.
After the query ends, all clients wait for the test to complete
before closing the query or closing the connection. This ensures
that the admission control decisions are based entirely on the
query end behavior.  This test works for both query admission control
and mem_limit admission control and can detect both kinds of admission
control resources ("admitted" and "reserved") not being released
promptly.

This is based on an earlier patch by Joe McDonnell.

Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-state.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/service/client-request-state.cc
M tests/custom_cluster/test_admission_controller.py
9 files changed, 191 insertions(+), 81 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8323/1
--
To view, visit http://gerrit.cloudera.org:8080/8323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac
Gerrit-Change-Number: 8323
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:30:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9
PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in 
parallel. In
nit: Can you wrap this at the red line provided by gerrit? I think it is 72 
characters. Emacs will wrap it for you at the right space with ctrl-q, if you 
choose.


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12
PS1, Line 12: minuites
nit: minutes


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480
PS1, Line 480:   run-step-backgroundable "Loading functional-query data" 
load-functional-query.log \
Could add a comment about what you decided to background and what you decided 
not to, and why?


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="1024" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> I'm currently testing to see if 512 is enough.
This looks like it will also increase HADOOP_HEAPSIZE when not doing a parallel 
load, which is a shame. Do you see a way around that?


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh
File testdata/bin/run-step.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53
PS1, Line 53:
nit: only one empty line, to match context


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84
PS1, Line 84:   RUN_STEP_PIDS=()
Do you want to reset MSGS, too?



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:17:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505
PS1, Line 505:   # Tests depend on the kudu data being clean, so load the data 
from scratch.
 :   run-step "Loading Kudu functional" load-kudu.log \
 : load-data "functional-query" "core" "kudu/none/none" 
force
 :   run-step "Loading Kudu TPCH" load-kudu-tpch.log \
 : load-data "tpch" "core" "kudu/none/none" force
It should be possible to do the same thing for these. That will only save about 
4 minutes, but this runs even when loading from a snapshot.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 23:16:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8262 )

Change subject: IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04
..


Patch Set 3:

> Patch Set 3:
>
> > Sorry to get to this a bit late.
>
> No apology required.
>
> I agree with all of your suggestions.
>
> If you want to file a ticket and assign it to me or if you want to send me a 
> patch to review, I'd be amenable.

Filed https://issues.apache.org/jira/browse/IMPALA-6079.

I've left it unassigned. It's not clear to me yet whether some other work will 
lead me right back to this :)


--
To view, visit http://gerrit.cloudera.org:8080/8262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8196a2a87bce5893a349a1b290c3f3d04fd80317
Gerrit-Change-Number: 8262
Gerrit-PatchSet: 3
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 18 Oct 2017 22:38:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src.

2017-10-18 Thread Zoram Thanga (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8305

to look at the new patch set (#2).

Change subject: IMPALA-5599: Clean up references to TimestampValue in be/src.
..

IMPALA-5599: Clean up references to TimestampValue in be/src.

This is a follow-on commit to d53f43b4. In this patch we replace all
inappropriate usage of TimestampValue in be/src/service and
be/src/statetore with simpler data types for Unix timestamps, and
where conversions to strings are needed, we use the interfaces
introduced by the previous commit.

Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/statestore/statestore-subscriber.cc
6 files changed, 29 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8305/2
--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src/service.

2017-10-18 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in 
be/src/service.
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.h@341
PS1, Line 341: time
> indicate the clock. e.g. monotonic vs unix time.
Done. This uses Unix time.


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.cc@a91
PS1, Line 91:
> What do you think about going a step further and replacing all callers of T
Changed the caller in be/src/statetore. The other calls in be/src/exprs are in 
test code, and seem legit


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc@303
PS1, Line 303:   int64_t duration_us = record.end_time_us - 
record.start_time_us;
> That's true but that also seems to be a pre-existing bug. I agree that we s
Please see my response to DanH's comment...


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc@303
PS1, Line 303:   int64_t duration_us = record.end_time_us - 
record.start_time_us;
> that can be negative if settimeofday() occurred in the mean time. What shou
The existing code does not handle this either. Looking at 
PrettyPrinter::Print(), it will basically print a negative elapsed time in 
nanoseconds. The only way to reliably handle intervening settimeofday(), or 
something like that would be to use a monotonic clock, perhaps 
MonotonicStopWatch.

Are you suggesting we go down that route?


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-server.h@622
PS1, Line 622: /// Start and end time of the query
> same comment about clock
Done


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-server.h@640
PS1, Line 640: Unix milliseconds
> like that
Done


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-server.cc@1009
PS1, Line 1009:   int64_t duration_ms = duration_us / MICROS_PER_MILLI;
> same question (though old code could go negative).
Please see my earlier response.



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Wed, 18 Oct 2017 22:24:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5668: Fix cast(X as timestamp) for negative subsecond Decimals

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8051 )

Change subject: IMPALA-5668: Fix cast(X as timestamp) for negative subsecond 
Decimals
..


Patch Set 4:

(2 comments)

Thanks for fixing this. Just had a couple of minor requests then I'll +2.

http://gerrit.cloudera.org:8080/#/c/8051/4/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8051/4/be/src/exprs/decimal-operators-ir.cc@604
PS4, Line 604:   DCHECK(nanoseconds >= numeric_limits::min()
We generally prefer DCHECK_GE() and DCHECK_LE() so that the out-of-range value 
is printed if this gets hit.


http://gerrit.cloudera.org:8080/#/c/8051/4/be/src/exprs/decimal-operators-ir.cc@622
PS4, Line 622: FromUnixTimeNanos
Can you add a note to this function's comment that it's expected to work for 
negative nanoseconds (I don't think it's obvious when looking at the function 
definition).



--
To view, visit http://gerrit.cloudera.org:8080/8051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8216f4c0f100c1bd68891cd6048236bfe4c205f0
Gerrit-Change-Number: 8051
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 22:00:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6070: Parallel data load.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8320


Change subject: IMPALA-6070: Parallel data load.
..

IMPALA-6070: Parallel data load.

This commit loads functional-query, TPC-H data, and TPC-DS data in parallel. In
parallel, these take about 37 minutes, dominated by functional-query. Serially,
these take about 30 minutes more, namely the 13 minutes of tpcds and 16
minuites of tpcds. This works out nicely because CPU usage during data load is
very low in aggregate. (We don't sustain more than 1 CPU of load, whereas build
machines are likely to have many CPUs.)

To do this, I added support to run-step.sh to have a notion of a backgroundable
task, and support waiting for all tasks.

I also increased the heapsize of our HiveServer2 server. When datasets were
being loaded in parallel, we ran out of memory at 256MB of heap.

The resulting log output is currently like so (but without the timestamps):

15:58:04  Started Loading functional-query data in background; pid 8105.
15:58:04  Started Loading TPC-H data in background; pid 8106.
15:58:04  Loading functional-query data (logging to 
/home/impdev/Impala/logs/data_loading/load-functional-query.log)...
15:58:04  Started Loading TPC-DS data in background; pid 8107.
15:58:04  Loading TPC-H data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpch.log)...
15:58:04  Loading TPC-DS data (logging to 
/home/impdev/Impala/logs/data_loading/load-tpcds.log)...
16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 
13 min 27 sec)
16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK 
(Took: 16 min 29 sec)
16:35:08Loading workload 'functional-query' using exploration strategy 
'exhaustive' OK (Took: 37 min 4 sec)

Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
---
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
3 files changed, 40 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/1
--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1349/


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 4
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:52:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 4:

Anyone know what's going on with this change not going through verification?  I 
got a pass on private build and test, and the failures here appear Kerberos 
related.  Trying a manual rebase...


-- 
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 4
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:51:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-18 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#3).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 148 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/3
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.

2017-10-18 Thread Philip Zeyliger (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8211

to look at the new patch set (#2).

Change subject: IMPALA-5243: Speed up code gen for wide Avro tables.
..

IMPALA-5243: Speed up code gen for wide Avro tables.

HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in
size to the number of columns. On 1000 column tables, codegen time is
significant. This commit roughly halves it for wide columns.
(Note that this had been much worse in recent history (<= Impala 2.9).)

It does so by breaking up MaterializeTuple() into multiple smaller
functions, and then calls them in order. When breaking up into
200-column chunks, there is a noticeable speed-up.

I've made the helper code for generating LLVM function prototypes
have a mutable function name, so that the builder can be re-used
multiple times.

I've checked by inspecting optimized LLVM that in the case where there's
only 1 helper function, code gets inlined so that there doesn't seem to
be an extra function.

I measured codegen time for various "step sizes." The case where there
are no helper functions is about 2.7s. The best case was about a step
size of 200, with timings of 1.3s.

For the query "select count(int_col16) from 
functional_avro.widetable_1000_cols",
codegen times as a function of step size are roughly as follows. This is
averaged across 5 executions, and rounded to 0.1s.

   step time
 10 2.4
 50 2.5
 75 2.9
100 3.0
125 3.0
150 1.4
175 1.3
200 1.3 <-- chosen step size
225 1.5
250 1.4
300 1.6
400 1.6
500 1.8
   1000 2.7

The raw data was generated like so, with some code that let me change the step 
size at runtime:

  $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for 
try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; 
impala-shell.sh -q "select count(int_col16) from 
functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 
'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' |
  sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee 
out.txt
  ...
  200  CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) 
CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K 
OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms
  ...
  1000  CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) 
CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns 
ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K 
OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms

I have run the core tests with this change.

Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
---
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
3 files changed, 148 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/2
--
To view, visit http://gerrit.cloudera.org:8080/8211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476
Gerrit-Change-Number: 8211
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7: Code-Review+1

Carry post rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 6: Code-Review+1

(1 comment)

Carrying Dimitris' +1.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> Well, you're only calling it for the diff, right? I would expect in most ca
Yep, I agree.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

2017-10-18 Thread Tim Armstrong (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8267

to look at the new patch set (#8).

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
..

IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

Switch the decoders to using more batch-oriented interfaces. As an
intermediate step this doesn't make the interfaces of LevelDecoder
or DictDecoder batch-oriented, only the lower-level utility classes.

The next step would be to change those interfaces to be batch-oriented
and make according optimisations in parquet. This could deliver much
larger perf improvements than the current patch.

The high-level changes are.
* BitReader -> BatchedBitReader, which is built to unpack runs of 32
  bit-packed values efficiently.
* RleDecoder -> RleBatchDecoder, which exposes the repeated and literal
  runs to the caller and uses BatchedBitReader to unpack literal runs
  efficiently.
* Dict decoding uses RleBatchDecoder to decode repeated runs efficiently
  and uses the BitPacking utilities to unpack and encode in a single
  step.

Also removes an older benchmark that isn't too interesting (since
the batch-oriented approach to encoding and decoding is so much
faster than the value-by-value approach).

Testing:
* Ran core tests.
* Updated unit tests to exercise new code.
* Added test coverage for the deprecated bit-packed level encoding to
  that it still works (there was no coverage previously).

Perf:
Single-node benchmarks showed a few % performance gain. 16 node cluster
benchmarks only showed a gain for TPC-H nested.

Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
---
M be/src/benchmarks/CMakeLists.txt
M be/src/benchmarks/bit-packing-benchmark.cc
D be/src/benchmarks/rle-benchmark.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
D be/src/experiments/bit-stream-utils.8byte.h
D be/src/experiments/bit-stream-utils.8byte.inline.h
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M be/src/util/parquet-reader.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
M testdata/data/README
A testdata/data/alltypes_agg_bitpacked_def_levels.parquet
M tests/query_test/test_scanners.py
19 files changed, 1,149 insertions(+), 946 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/8267/8
--
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Allow the SASL protocol service name to be configurable

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8230 )

Change subject: Allow the SASL protocol service name to be configurable
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1348/


--
To view, visit http://gerrit.cloudera.org:8080/8230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Gerrit-Change-Number: 8230
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:21:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] Allow the SASL protocol service name to be configurable

2017-10-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8230 )

Change subject: Allow the SASL protocol service name to be configurable
..


Patch Set 3: Code-Review+2

Rebase, carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/8230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Gerrit-Change-Number: 8230
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:20:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:09:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..

Add 'psmisc' to bootstrap_system.sh.

testdata/bin/kill-all.sh uses /usr/bin/killall, which
is provided by the 'psmisc' package on Ubuntu. This
commit adds the package to the list of packages we install.

The Docker base image for Ubuntu 16.04 doesn't contain this package,
which is how this came up.

Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Reviewed-on: http://gerrit.cloudera.org:8080/8306
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M bin/bootstrap_system.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
> What is this address? Can we use a domain name and https?
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

This is EC2 magic. If you're not on EC2, this will tell you to set the 
environment variables (if you're testing S3). If you're on EC2, this will 
return something. We don't actually check that those credentials can access the 
relevant bucket, but at least it lets you through.



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:06:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276
PS1, Line 276: 169.254.169.254
What is this address? Can we use a domain name and https?

What would a new Impala developer get as a result of this?



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 20:55:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options

2017-10-18 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8038 )

Change subject: IMPALA-5736: Add impala-shell argument to set default query 
options
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8038/7/shell/option_parser.py
File shell/option_parser.py:

http://gerrit.cloudera.org:8080/#/c/8038/7/shell/option_parser.py@42
PS7, Line 42: if default_options[option] in [True, False]:
> How about "if type(...) is bool:" ?
The recommended[0] way is to use isinstance() [1]

[0] https://www.python.org/dev/peps/pep-0008/

[1] https://docs.python.org/2/library/functions.html#isinstance


http://gerrit.cloudera.org:8080/#/c/8038/7/shell/option_parser.py@91
PS7, Line 91: upper
> Is this needed? If so, can you add a comment explaining why?
Dictionary comprehensions were only added in Python 2.7 [0]. Impala has users 
using older OSs that won't have Python 2.7. If this is needed, please re-write 
it not to use the comprehension.

[0] https://docs.python.org/2.7/whatsnew/2.7.html#python-3-1-features


http://gerrit.cloudera.org:8080/#/c/8038/7/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/8038/7/tests/shell/test_shell_interactive.py@300
PS7, Line 300:
What about testing for:
1. query options that don't exist or don't parse?
2. query option with an invalid value



--
To view, visit http://gerrit.cloudera.org:8080/8038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986
Gerrit-Change-Number: 8038
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 18 Oct 2017 20:34:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Allow configuration of values passed into kerberos env vars

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8308 )

Change subject: Allow configuration of values passed into kerberos env vars
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab4ce72c04ec4056dc89fb4c1c540a6fdaca4404
Gerrit-Change-Number: 8308
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 18 Oct 2017 20:26:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] Allow configuration of values passed into kerberos env vars

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8308 )

Change subject: Allow configuration of values passed into kerberos env vars
..

Allow configuration of values passed into kerberos env vars

We always used hardcoded constants for the following kerberos
environment variables:

KRB5CCNAME and KRB5RCACHETYPE.

This patch allows for the configuration of these variables by taking
arguments to InitKerberosForServer().

Callsites within Kudu have not been changed as all the parameters have
default values.

The motivation for this patch is that, Impala as a user of the
KuduRPC and Kudu security libraries, needs to have a file based
credential cache since the kinit happens on the C++ side and this cache
needs to be read by the Java side too. Hence, we cannot have it in memory.
Also, Impala still requires replay protection, since some Impala services
use Thrift which lacks the nonce mechanism that KRPC uses for replay
protection.

Change-Id: Iab4ce72c04ec4056dc89fb4c1c540a6fdaca4404
Reviewed-on: http://gerrit.cloudera.org:8080/8247
Reviewed-by: Todd Lipcon 
Tested-by: Todd Lipcon 
Reviewed-on: http://gerrit.cloudera.org:8080/8308
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M be/src/kudu/security/init.cc
M be/src/kudu/security/init.h
2 files changed, 19 insertions(+), 12 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iab4ce72c04ec4056dc89fb4c1c540a6fdaca4404
Gerrit-Change-Number: 8308
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1344/


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 3
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 20:10:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5599: Clean up references to TimestampValue in be/src/service.

2017-10-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8305 )

Change subject: IMPALA-5599: Clean up references to TimestampValue in 
be/src/service.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/client-request-state.cc@a91
PS1, Line 91:
What do you think about going a step further and replacing all callers of 
TimestampValue::LocalTime() with UnixMicros() or the like ? There doesn't seem 
to be many callers left after this patch.


http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/8305/1/be/src/service/impala-http-handler.cc@303
PS1, Line 303:   int64_t duration_us = record.end_time_us - 
record.start_time_us;
> that can be negative if settimeofday() occurred in the mean time. What shou
That's true but that also seems to be a pre-existing bug. I agree that we 
should handle it though.



--
To view, visit http://gerrit.cloudera.org:8080/8305
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I642a1d713597826bb7c15cd2ecb6638cb813a02c
Gerrit-Change-Number: 8305
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:22:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 10: Code-Review+2

carry Dan's +2


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:12:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 10:

Mostafa, we spoke offline about this but just wanted to confirm it looked ok to 
you. The summary is that the CPU cost of value materialisation for 
non-dict-encoded strings in uncompressed parquet scans regresses to roughly the 
same CPU cost as snappy-compressed parquet. I didn't see any regressions on 
TPC-H, only narrowly targeted benchmarks.


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:12:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1347/


--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:10:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors

2017-10-18 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8233 )

Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker 
errors
..


Patch Set 7: Code-Review+2

Rebasing.
Carrying over +2.


--
To view, visit http://gerrit.cloudera.org:8080/8233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff
Gerrit-Change-Number: 8233
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:10:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc@983
PS8, Line 983: data
> buffers?
Done



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 19:09:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Tim Armstrong (Code Review)
Hello Lars Volker, Alex Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8085

to look at the new patch set (#9).

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..

IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

This change only affects uncompressed plain-encoded Parquet where
RowBatches may directly reference strings stored in the I/O
buffers. The proposed fix is to simply copy the data pages if
needed then use the same logic that we use for decompressed data
pages.

This copy inevitably adds some CPU overhead, but I believe this is
acceptable because:
* We generally recommend using compression, and optimize for that
  case.
* Copying memory is cheaper than decompressing data.
* Scans of uncompressed data are very likely to be I/O bound.

This allows several major simplifications:
* The resource management for compressed and uncompressed
  scans is much more similar.
* We don't need to attach Disk I/O buffers to RowBatches.
* We don't need to deal with attaching I/O buffers in
  ScannerContext.
* Column readers can release each I/O buffer *before* advancing to
  the next one, making it easier to reason about resource
  consumption. E.g. each Parquet column only needs one I/O buffer at
  a time to make progress.

Future changes will apply to Avro, Sequence Files and Text. Once
all scanners are converted, ScannerContext::contains_tuple_data_
will always be false and we can remove some dead code.

Testing
===
Ran core ASAN and exhaustive debug builds.

Perf

No difference in most cases when scanning uncompressed parquet.
There is a significant regression (50% increase in runtime) in
targeted perf tests scanning non-dictionary-encoded strings (see
benchmark output below).  After the regression performance is
comparable to Snappy compression.

I also did a TPC-H run but ran into some issues with the report
generator. I manually compared times and there were no regressions.

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
++---+-++++
| TARGETED-PERF(_61) | parquet / none / none | 23.02   | +0.60% | 4.23  
 | +5.97% |
++---+-++++

+++---++-++++-+---+
| Workload   | Query  | File Format   | Avg(s) | 
Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+++---++-++++-+---+
| TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00   | 
1.98| R +52.10%  |   0.97%|   1.25%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86   | 
1.92| R +49.11%  |   0.34%|   2.34%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16   | 
2.15| R +47.04%  |   1.03%|   0.72%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16   | 
2.17| R +45.60%  |   0.14%|   1.11%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51   | 
2.55| R +37.88%  |   0.83%|   0.49%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79   | 
0.61| R +30.86%  |   1.54%|   4.10%| 1   | 5 |
| TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45  | 
35.07   |   +12.51%  |   0.29%|   0.29%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78   | 
6.10|   +11.13%  |   0.99%|   0.74%| 1   | 5 |
| TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83   | 
8.14|   +8.52%   |   0.15%|   0.32%| 1   | 5 |
...

Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.h
4 files changed, 66 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/9
--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-18 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim 
Armstrong, Mostafa Mokhtar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7793

to look at the new patch set (#7).

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..

IMPALA-4252: Min-max runtime filters for Kudu

This patch implements min-max filters for runtime filters. Each
runtime filter generates a bloom filter or a min-max filter,
depending on if it has HDFS or Kudu targets, respectively.

Min-max filters are generated by the PartitionedHashJoinBuilder. For
now, min-max filters are only applied at the KuduScanner, which passes
them into the Kudu client. Because the Kudu client doesn't provide a
way to specify generic filter exprs, min-max filters are only
generated when the target expr is a bare Kudu column ref.

Future work will address applying min-max filters at HDFS scan nodes
and applying bloom filters at Kudu scan nodes.

Codegen is used to eliminate branching on the type of the min-max
filter.

Functional Testing:
- Added new planner tests and updated the old ones.
- Updated existing runtime filter tests to work with Kudu.
- Added e2e tests for min-max filter specific functionality.

Perf Testing:
- All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu,
  timings are averages of 3 runs.
- Ran a contrived query with a filter that does eliminate any rows
  (full self join of lineitem). The difference in running time was
  negligible - 24.46s with filters on, 24.15s with filters off for
  a ~1% slowdown.
- Ran a contrived query with a filter that elimiates all rows (self
  join on lineitem with a join condition that never matches). The
  filters resulted in a significant speedup - 0.26s with filters on,
  1.46s with filters off for a ~5.6x speedup. This query is added to
  targeted-perf.

Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/service/impala-internal-service.cc
M be/src/util/CMakeLists.txt
A be/src/util/min-max-filter-ir.cc
A be/src/util/min-max-filter.cc
A be/src/util/min-max-filter.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 

[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-18 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/6//COMMIT_MSG@15
PS6, Line 15: them into the Kudu client. Because the Kudu client doesn't 
provide a
> Is there any documentation for Kudu about what ordering Kudu uses for compa
I asked the Kudu team, and its not documented anywhere, but they confirmed they 
use the same ordering as Impala does.


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@106
PS6, Line 106: bloom_filter
> has_bloom_filter() ? At callsites it reads like this should return a filter
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/exec/filter-context.h@109
PS6, Line 109: min_max_filter
> has_min_max_filter?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/coordinator.cc@1220
PS6, Line 1220:   MinMaxFilter::Or(src_type(), params.min_max_filter, 
min_max_filter_.get());
> I mentioned this in a later comment, but we should consider what happens if
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-bank.cc@235
PS6, Line 235:   MinMaxFilter* min_max_filter = MinMaxFilter::Create(type, 
_pool_);
> (nit) - could combine into one line.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc
File be/src/runtime/runtime-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/runtime/runtime-filter-ir.cc@27
PS6, Line 27:   // to a) the atomicity of / pointer assignments and b) the x86 
TSO memory model.
> Comment not relevant any more since we're using actual atomics?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter-ir.cc@53
PS6, Line 53:   min_str = string(value->ptr, value->len);
> We should think through the behaviour with very large strings. It may make
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.h@83
PS6, Line 83: #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE)
 \
> Yeah, the macro seems like it may be the least of the evils.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@24
PS6, Line 24: "
> #include "runtime/timestamp-value.inline.h"
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@77
PS6, Line 77:   std::string NAME##MinMaxFilter::DebugString() const {   
 \
> nit: don't need std:: prefix in .cc files.
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@139
PS6, Line 139:   out->min.__set_string_val(std::min(in.min.string_val, 
out->min.string_val));
> I feel like we should use our StringValue::Compare() function instead of re
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291
PS6, Line 291:   BigIntMinMaxFilter::Or(in, out);
> Shouldn't this be calling the timestamp method?
TColumnData doesn't have a timestamp field, so I convert timestamps with 
UtcToUnixTimeMicros and pass them around in thrift as longs. (noted in a 
comment now)

Unless I'm missing a reason that comparing timestamps with way won't lead to 
the same results as comparing the unconverted TimestampValues.


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@327
PS6, Line 327:   }
> Timestamp?
Same as above.


http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/7793/6/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@379
PS6, Line 379: if (slotRef == null || slotRef.getDesc().getColumn() == 
null
> Are there planner tests for all these cases?
Done


http://gerrit.cloudera.org:8080/#/c/7793/6/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:


[Impala-ASF-CR] Allow the SASL protocol service name to be configurable

2017-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8230 )

Change subject: Allow the SASL protocol service name to be configurable
..


Patch Set 2: Code-Review+2

Okay, thats fine. We can revisit if these conflicts cause future work.


--
To view, visit http://gerrit.cloudera.org:8080/8230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e30fe4461893b67527333259579e2304b19af1e
Gerrit-Change-Number: 8230
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 18:42:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 8: Code-Review+2

(1 comment)

Yes, this is good with me. Could you get Mostafa to sign off on perf 
implications, perhaps recording that in the JIRA if not already 
(issues.apache.org not loading for me)?

http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/8/be/src/exec/parquet-column-readers.cc@983
PS8, Line 983: data
buffers?



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 18:16:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5976: Remove equivalent class computation in FE

2017-10-18 Thread Tianyi Wang (Code Review)
Tianyi Wang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8317


Change subject: IMPALA-5976: Remove equivalent class computation in FE
..

IMPALA-5976: Remove equivalent class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from value the transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and adjacency list based DFS to speed up on both
condensed and sparse graphs.

Testing: A few test cases are modified because the change in equivalence
definition enables some existing optimizations on these cases. It passes
other existing tests. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 104ms and
the planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BoolLiteral.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNotEmptyPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/NullLiteral.java
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TimestampArithmeticExpr.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
A fe/src/main/java/org/apache/impala/util/Graph.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
30 files changed, 805 insertions(+), 921 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8317/1
--
To view, visit http://gerrit.cloudera.org:8080/8317
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Gerrit-Change-Number: 8317
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 8:

Dan, does this look ok to you at a high level? I think the code change itself 
is relatively straightforward but it would be good to confirm the direction and 
perf trade-off.


--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:54:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
> We may want to include a check here that "set -x" has not been enabled?
(Bash hackery)

If you want, you can use a subshell:

$(set -x; (set +x; [ $USER == philip ])) && echo yes || echo no
+ set +x
yes
10:46:02 mystery  ~
$(set -x; (set +x; [ $USER == phili ])) && echo yes || echo no
+ set +x
no


Return values survive, and you can just do

if (set +x; [[ -z ${AWS_SECRET_ACCESS_KEY_ID-} ]]); then
  ...
fi

I tested something similar above.



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:47:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs

2017-10-18 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8294 )

Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256
PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then
We may want to include a check here that "set -x" has not been enabled?

# Prevent leaking the AWS keys to the log
set_x=0
if set +o | grep -q "set -o xtrace"; then
  set_x=1
  set +x
fi

DO STUFF

# Restore xtrace flag
if [[ $set_x -eq 1 ]]; then
  set -x
fi



--
To view, visit http://gerrit.cloudera.org:8080/8294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae
Gerrit-Change-Number: 8294
Gerrit-PatchSet: 1
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:32:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..


Patch Set 1:

> Would you like me to submit this for you, Phil?

Yes, please. I don't have permission to do it.

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:10:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1346/


--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:11:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.

2017-10-18 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8306 )

Change subject: Add 'psmisc' to bootstrap_system.sh.
..


Patch Set 1:

Would you like me to submit this for you, Phil?


--
To view, visit http://gerrit.cloudera.org:8080/8306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3
Gerrit-Change-Number: 8306
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Wed, 18 Oct 2017 17:09:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4918: Support getting column comments via HS2

2017-10-18 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8315 )

Change subject: IMPALA-4918: Support getting column comments via HS2
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8315/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8315/2//COMMIT_MSG@9
PS2, Line 9: Fill the 'comments'/'remarks' field during HS2 column metadata 
requests.
Please include in the commit message how you tested this. I think this also 
should include tests for other tables formats (Kudu, HBase). Those could be end 
to end tests. It would also be good to re-visit the existing "ALTER TABLE" 
tests and make sure they test adding comments.


http://gerrit.cloudera.org:8080/#/c/8315/2/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

http://gerrit.cloudera.org:8080/#/c/8315/2/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@189
PS2, Line 189: dummyTable.addColumn(new Column(colDef.getColName(), 
colDef.getType(), colDef.getComment(), i));
Long line.


http://gerrit.cloudera.org:8080/#/c/8315/2/fe/src/test/java/org/apache/impala/service/JdbcTest.java
File fe/src/test/java/org/apache/impala/service/JdbcTest.java:

http://gerrit.cloudera.org:8080/#/c/8315/2/fe/src/test/java/org/apache/impala/service/JdbcTest.java@460
PS2, Line 460: assertEquals("Incorrect table comment", "", 
rs.getString("REMARKS"));
This looks like a bug in getTables() to me, but Alex and Dimitris would know 
better.



--
To view, visit http://gerrit.cloudera.org:8080/8315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
Gerrit-Change-Number: 8315
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 18 Oct 2017 16:44:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Allow configuration of values passed into kerberos env vars

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8308 )

Change subject: Allow configuration of values passed into kerberos env vars
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1345/


--
To view, visit http://gerrit.cloudera.org:8080/8308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab4ce72c04ec4056dc89fb4c1c540a6fdaca4404
Gerrit-Change-Number: 8308
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 18 Oct 2017 16:34:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-18 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8303

to look at the new patch set (#9).

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..

IMPALA-1575: Part 1: eagerly release query exec resources

Release of backend resources for query execution on the coordinator
daemon should occur eagerly as soon as query execution is finished
or cancelled. Before this patch some resources managed by QueryState,
like scratch files, were only released when the query was closed
and all of the query's control structures were torn down.

These resources are referred to as "ExecResources" in various places to
distinguish them from resources associated with the client request (like
the result cache) that are still required after the query finishes
executing.

This first change does not solve the admission control problem for two
reasons:
* We don't release the "admitted" memory on the coordinator until
  the query is unregistered.
* Admission control still considers the memory reserved until the
  query memtracker is unregistered, which happens only when the
  QueryState is destroyed: see MemTracker::GetPoolMemReserved().

The flow is mostly similar to initial_reservation_refcnt_, except the
coordinator also holds onto a reference count, which is released
when either the final row is returned or cancellation is initiated.
After the coordinator releases its refcount, the resources can be
freed as soon as local fragments also release their refcounts.

Also clean up Coordinator slightly by preventing runtime_state() from
leaking out to ClientRequestState - instead it's possible to log
the query MemTracker by following parent links in the MemTracker.

This patch is partially based on Joe McDonnell's IMPALA-1575 patch.

Testing:
Ran core tests.

Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M be/src/service/client-request-state.cc
10 files changed, 122 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/9
--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources

2017-10-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/8303 )

Change subject: IMPALA-1575: Part 1: eagerly release query exec resources
..

IMPALA-1575: Part 1: eagerly release query exec resources

Release of backend resources for query execution on the coordinator
daemon should occur eagerly as soon as query execution is finished
or cancelled. Before this patch some resources managed by QueryState,
like scratch files, were only released when the query was closed
and all of the query's control structures were torn down.

These resources are referred to as "ExecResources" in various places to
distinguish them from resources associated with the client request (like
the result cache) that are still required after the query finishes
executing.

This first change does not solve the admission control problem for two
reasons:
* We don't release the "admitted" memory on the coordinator until
  the query is unregistered.
* Admission control still considers the memory reserved until the
  query memtracker is unregistered, which happens only when the
  QueryState is destroyed: see MemTracker::GetPoolMemReserved().

The flow is mostly similar to initial_reservation_refcnt_, except the
coordinator also holds onto a reference count, which is released
when either the final row is returned or cancellation is initiated.
After the coordinator releases its refcount, the resources can be
freed as soon as local fragments also release their refcounts.

Also clean up Coordinator slightly by preventing runtime_state() from
leaking out to ClientRequestState - instead it's possible to log
the query MemTracker by following parent links in the MemTracker.

This patch is partially based on Joe McDonnell's IMPALA-1575 patch.

Testing:
Ran core tests.

Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M be/src/service/client-request-state.cc
10 files changed, 122 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/8
--
To view, visit http://gerrit.cloudera.org:8080/8303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341
Gerrit-Change-Number: 8303
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1344/


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 3
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 16:06:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 2:

Looks like a transient failure, retrying


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 16:05:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8274 )

Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1343/


--
To view, visit http://gerrit.cloudera.org:8080/8274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24
Gerrit-Change-Number: 8274
Gerrit-PatchSet: 2
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Wed, 18 Oct 2017 16:04:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala

2017-10-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7954 )

Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala
..


Patch Set 15:

> Csaba, are those failing tests specifically require a timestamp
 > that issues such a warning? If not, I suggest we just change the
 > timestamps of those tests in a way that preserves test coverage and
 > avoids the warnings problem.
Yes, those tests work with "non timestamp" strings by design.


--
To view, visit http://gerrit.cloudera.org:8080/7954
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff
Gerrit-Change-Number: 7954
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 18 Oct 2017 14:28:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4918: Support getting column comments via HS2

2017-10-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8315 )

Change subject: IMPALA-4918: Support getting column comments via HS2
..

IMPALA-4918: Support getting column comments via HS2

Fill the 'comments'/'remarks' field during HS2 column metadata requests.

Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
---
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
4 files changed, 54 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8315/2
--
To view, visit http://gerrit.cloudera.org:8080/8315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
Gerrit-Change-Number: 8315
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-4918: Support getting column comments via HS2

2017-10-18 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8315


Change subject: IMPALA-4918: Support getting column comments via HS2
..

IMPALA-4918: Support getting column comments via HS2

Fill the 'comments'/'remarks' field during HS2 column metadata requests.

Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
---
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
4 files changed, 55 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/8315/1
--
To view, visit http://gerrit.cloudera.org:8080/8315
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1d33dfd031b5344d7136695b623cec76143ada5c
Gerrit-Change-Number: 8315
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] Allow configuration of values passed into kerberos env vars

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8308 )

Change subject: Allow configuration of values passed into kerberos env vars
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1342/


--
To view, visit http://gerrit.cloudera.org:8080/8308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab4ce72c04ec4056dc89fb4c1c540a6fdaca4404
Gerrit-Change-Number: 8308
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 18 Oct 2017 07:46:14 +
Gerrit-HasComments: No