[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 331:   status = SerializeAndCompressCatalogObject(catalog_object, 
item);
Thinking we should have a similar separation of serialization and compression 
like we do for the decompression.


Line 366:   compressed_buffer.resize(result_len);
Remove since we're copying into item.value anyway


Line 367:   item.value = string(reinterpret_cast(compressed_buffer_ptr),
use item.value.assign()


http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 557:   std::vector& data_buffer) WARN_UNUSED_RESULT;
we use pointers for out params, i.e. use std::vector*


http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/compress.h
File be/src/util/compress.h:

Line 113
revert


http://gerrit.cloudera.org:8080/#/c/7268/3/be/src/util/decompress.h
File be/src/util/decompress.h:

Line 86: 
revert


http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:

Line 45:   assert result.data[0] == "7300"
> Added the result check. Out of curiosity, what's so special about the parqu
Nothing special about Parquet, just another table.


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


IMPALA-5554: sorter DCHECK on null column

The bug was in the DCHECK. The DCHECK is intended to make sure that a
tuple's string data didn't get split across blocks. The logic assumed
that if the second-or-later string column was in the next-block, that
the strings were split between blocks. However, that assumption is
invalid if there are NULL strings, which do not belong in any block.

The fix for the DCHECK (which is still useful) is to count the number
of non-NULL strings and make sure that no non-NULL strings were split
between blocks.

Testing:
Added a test that reproduces the crash.

Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Reviewed-on: http://gerrit.cloudera.org:8080/7295
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/runtime/sorter.cc
M 
testdata/workloads/functional-query/queries/QueryTest/single-node-large-sorts.test
2 files changed, 39 insertions(+), 3 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 330:   if (!status.ok()) {
> take out of if/else since it's common to both cases
Done


Line 367:   uint8_t* compressed_buffer_ptr = &compressed_buffer[0];
> comptessed_buffer.data()
Done


Line 370:   // Base64 encode the compressed catalog object and store it in the 
topic item.
> Is base64-encoding really necessary? Why don't the raw bytes work?
Hm, not really. It works without it, I only used it because the topic items use 
strings for keys and values. Removed it.


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 200:   /// Serializes 'catalog_object' and compresses it using Snappy 
compression.
> remove second "compression"
Done


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const 
TTopicItem& item,
> Why not separate decompression from deserialization? Seem like separate con
Done


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h
File be/src/util/compress.h:

Line 103:   SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = 
false);
> Why these changes? Shouldn't these always be created using CreateCompressor
oops forgot about these. I was playing around with directly creating the 
compressor. reverted.


http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:

Line 45:   "select count(*) from functional.alltypes")
> Check result? Maybe also use another query on a parquet table.
Added the result check. Out of curiosity, what's so special about the parquet 
table in this case?


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/compress.h
M be/src/util/decompress.h
A tests/custom_cluster/test_compact_catalog_updates.py
7 files changed, 141 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/793/

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7299

to look at the new patch set (#2).

Change subject: IMPALA-5571: Fix authorization tests
..

IMPALA-5571: Fix authorization tests

Issue:
With the fix for IMPALA-5549 the catalog cache of an impalad may end up
with stale authorization metadata in the presence of GRANT/REVOKE
statements that modify multiple privileges.

Fix:
Revert the behavior of GRANT/REVOKE privilege DDL statements.

Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
1 file changed, 7 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7299/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 2: Code-Review+2

Rebase. Keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


IMPALA-5431: Remove redundant path exists checks during table load

There are multiple places that do an exists() check on a path and then
perform some subsequent action on it. This pattern results in two
RPCs to the NN (one for the exists() check and one for the subsequent
action). We can avoid the exists() check in these cases since most HDFS
methods on paths throw a FileNotFoundException if the path does not
exist. This can save an RPC to NN and improve the metadata loading time.

Testing: Enough tests already cover this code path. This patch
passed core and exhaustive tests.

Metadata benchmark shows decent increase in perf numbers, for ex:

100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51%
80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58%
80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13%
80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38%
80-PARTITIONS-250K-FILES-S3-12-DROP -23.69%
100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91%
100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04%
100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38%
80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47%
100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72%
80-PARTITIONS-250K-FILES-S3-01-DROP -95.33%
80-PARTITIONS-250K-FILES-01-DROP -95.93%

Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Reviewed-on: http://gerrit.cloudera.org:8080/7095
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 64 insertions(+), 39 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Bharath Vissapragada: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
> I'm fixing this.
Filed IMPALA-5585


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-26 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
> Good catch, I suspect the intention was to call this in the same place Test
I'm fixing this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
> Vincent, MJ, I don't really see how this code is executed, are you sure the
Good catch, I suspect the intention was to call this in the same place 
TestNextDayFunction() is called.


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

Issue:
GetTables metadata request may result in an IndexOutOfBoundsException
while accessing table comments fields if one or more tables can't be
retrieved from the catalog cache.

Fix:
Only add tables that can be retrieved from the catalog cache (i.e. no
TableLoadingException is thrown) in the result of a GetTables request.

Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Reviewed-on: http://gerrit.cloudera.org:8080/7296
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
2 files changed, 11 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test:

Line 6
> As mentioned in the other comment the codegen hint isn't equivalent to the 
I agree that we should be able to distinguish between user-set and planner-set 
query options.

Yes, you can pretty accurately guess that the single-node plan optimization 
kicked in by looking at the plan. But you might not be able to guess the 
effects of runtime_filter_mode=off, and the stats-dependent behavior of 
num_scanner_threads=1 and adjusting the batch_size.

The setting of MT_DOP=4 for COMPUTE STATS is not so obvious and arguably 
belongs in this same list as the disable codegen hint. I'd need to check of 
there are other examples of significant setting of query options.

Personally, I don't think the codegen hint even makes sense as it is today. In 
most cases I'd imagine users want the hint behavior, i.e., try to at least run 
the query. But that option is not directly accessible to users. I'd much more 
prefer a new value of disable_codegen like "force_disable" where queries that 
need codegen will fail. I'd imagine that option would only be used in very rare 
circumstances.

I like your suggestion of "Codegen disabled by planner". Let's do that. We can 
consider extending that to also include the reason (query option and value), 
but I let's not add that in this patch.


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
> I think we have coverage that the query produces the expected results when 
Good point, we do have coverage there.

Let's leave these tests, but please rename to disable-codegen.test for 
consistency with the corresponding planner test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/792/

-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 1:

Oh ok, Thanks for clarifying.

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 4: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/788/

-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 1:

For others reading along, the original code from 
http://gerrit.cloudera.org:8080/7248 had a TODO that explicitly stated that 
updating multiple objects via direct-ddl does not work.

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

Change-Id: I65495748984268763b2690c193488a338409e13e
Reviewed-on: http://gerrit.cloudera.org:8080/7236
Reviewed-by: John Russell 
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_known_issues.xml
1 file changed, 26 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 1: Code-Review+2

Let's unblock the builds.

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test:

Line 6: Planner Hints: Disable codegen
> The term "hint" already has a different meaning as in "straight_join" kind 
As mentioned in the other comment the codegen hint isn't equivalent to the 
query option. It seems a bit questionable that with the other query options we 
can't distinguish between user-set options and planner-set options in the 
profile (although you can usually infer for single node plan).

Disabling codegen does seem like a significant enough planner decision that it 
should be somewhat visible in the explain plan. It's a bit less of a problem 
for the single node optimisation since then there's usually a visible change to 
the plan shape. We can test it like testComputeStatsMtDop() but if there's 
nothing in the explain plan it's harder to understand what the planner did.

"Planner Hint" might not be the best term if we do include it. Not sure what 
would be better. Just a plain string like "Codegen disabled by planner"?


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
> The hint is applied to all FE->BE expr evaluations, and the single-node opt
I think we have coverage that the query produces the expected results when the 
hint is set, but we don't have any coverage that codegen was actually disabled.

The distinction between the hint and the query option exists for a reason - 
there are some exprs that can only be evaluated with codegen enabled. In those 
cases the hint is overridden. But if the query option is set explicitly, then 
the option takes precedence and the query fails because the exprs can't be 
evaluated.

b7eeb8bf85ef24b730c9ba2891f5a8075ba9605e added some coverage of that 
interaction - it runs test_udfs with exec_single_node_rows_threshold set to a 
couple of different values, which exercises that case.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6991/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
Vincent, MJ, I don't really see how this code is executed, are you sure these 
tests are running? I don't see this code called from a GTest test.

You should put these test cases into a GTest like this:

TEST_F(ExprTest, LastDay) {
  // Code goes here
}

The ExprTest class should only contain helper functions and not test cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 1:

It's not clear to me yet. I am just posting this to unblock the builds. I need 
to investigate this more. I agree though, in theory that should work fine.

-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/132/

-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7236/3/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS3, Line 572: known
 :   as topics
> remove. statestore topics are not only the catalog updates.
Done


PS3, Line 576: topic
> "catalog update topic"
Done


Line 585:   until this issue is resolved.
> You may want to mention that 'false' is the default value.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 4: Code-Review+2

Carrying forward +2 from Dimitris.

-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7236

to look at the new patch set (#4).

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..

IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

Change-Id: I65495748984268763b2690c193488a338409e13e
---
M docs/topics/impala_known_issues.xml
1 file changed, 26 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5571: Fix authorization tests
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7299/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 2919: } else if (updatedPrivs.size() > 1) {
This is more of a question than a comment. I know this is the behavior prior to 
your patch(IMPALA-5549) but what exactly prevents the processing/applying of 
multiple privileges in the coordinator/fe? Had a quick look at the code and 
ProcessCatalogUpdateResult() and updateCatalog() calls seems to handle multiple 
updated objects fine? Is something else causing the problem?


-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7153/6/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test:

Line 6: Planner Hints: Disable codegen
The term "hint" already has a different meaning as in "straight_join" kind of 
hint, so I think we'd need a different term. 

To move forward with this approach, I think we would need at least a plan to 
extend this to capture all changes the FE makes to query options, and we should 
be clear that we are referring to query options. I understand that codegen is 
unfortunately special with it's "hint" mechanism, but I don't think we should 
expose those details to the user.

Do you think we want to embark on that journey to extend this report any time 
soon? I'm concerned about the incompleteness of only capturing the codegen 
decision. Also, I'm concerned about potential inconsistency with the query 
profile.

An alternative is to check the effective codegen hint in a planner test like we 
do for PlannerTest.testComputeStatsMtDop()


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
> I don't believe so. "git grep 'optimization hints'" doesn't show up any tes
The hint is applied to all FE->BE expr evaluations, and the single-node 
optimization tests it, so I'm not sure we need these tests here specifically.

Otoh, the interaction between the disable_codegen query option and the 
disable_codegen_hint does not seem well tested, so perhaps we should add 
targeted tests for that in a separate patch, or even better get rid of the 
codegen hint altogether.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/791/

-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 7: Code-Review+2

(3 comments)

Carrying +2.

http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS5, Line 302: RemoteIterator fileStatusIter =
> nit: move it below L305. If we're going to return there is no need to get a
Done


PS5, Line 727: DataFile(fileStatus)) continue;
> newFileDescs?
Done. Rewrote a little.


http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 498:   return fs.listStatus(p);
> I am thinking maybe add a warn in the log that the path doesn't exist.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7095

to look at the new patch set (#7).

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..

IMPALA-5431: Remove redundant path exists checks during table load

There are multiple places that do an exists() check on a path and then
perform some subsequent action on it. This pattern results in two
RPCs to the NN (one for the exists() check and one for the subsequent
action). We can avoid the exists() check in these cases since most HDFS
methods on paths throw a FileNotFoundException if the path does not
exist. This can save an RPC to NN and improve the metadata loading time.

Testing: Enough tests already cover this code path. This patch
passed core and exhaustive tests.

Metadata benchmark shows decent increase in perf numbers, for ex:

100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51%
80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58%
80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13%
80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38%
80-PARTITIONS-250K-FILES-S3-12-DROP -23.69%
100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91%
100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04%
100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38%
80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47%
100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72%
80-PARTITIONS-250K-FILES-S3-01-DROP -95.33%
80-PARTITIONS-250K-FILES-01-DROP -95.93%

Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 64 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 3: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7236/3/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS3, Line 572: known
 :   as topics
remove. statestore topics are not only the catalog updates.


PS3, Line 576: topic
"catalog update topic"


Line 585:   until this issue is resolved.
You may want to mention that 'false' is the default value.


-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7153/3/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

Line 60:   || (missingStats && !scan.hasLimit() && 
scan.getConjuncts().isEmpty())) {
> I'm thinking about the case:
Done


http://gerrit.cloudera.org:8080/#/c/7153/3/testdata/workloads/functional-query/queries/QueryTest/codegen.test
File testdata/workloads/functional-query/queries/QueryTest/codegen.test:

Line 2:  QUERY
> Don't we have other tests that confirm the codegen_hint is honored? Validat
I don't believe so. "git grep 'optimization hints'" doesn't show up any tests. 
I added a planner test to test the planner decision about whether to disable 
codegen. I think this test addresses a valid coverage gap aside from that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..

IMPALA-5483: Automatically disable codegen for small queries

This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.

This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.

Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).

I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s.  To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.

Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)

Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.

Added an e2e test to make sure that the optimisation is applied in
the backend.

Added planner tests for various cases where codegen should and shouldn't
be disabled.

Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.

Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A testdata/workloads/functional-query/queries/QueryTest/codegen.test
A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
A tests/query_test/test_codegen.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
20 files changed, 455 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

A recent change moved the initialization of output_tuple_desc_
to the constructor of AggregationNode. This changes the order
in which tuple_pool_ and output_tuple_desc_ are initialized.
The code in AggregationNode::Close() made the assumption that
tuple_pool_ cannot be NULL (although without a DCHECK) if
output_tuple_desc_ is not NULL. This no longer holds in the
new code.

This change makes AggregationNode::Close() checks tuple_pool_
to see if it's NULL before using it.

Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Reviewed-on: http://gerrit.cloudera.org:8080/7254
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M be/src/exec/aggregation-node.cc
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:   
Why mention IMPALA-5583 also?


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:   This option determines the join strategy that Impala uses when 
any of the tables
We deliberately did not use "join strategy" in the option name because strategy 
is too generic.


Line 47:   Hive ANALYZE TABLE statement.
Sure you want to keep the ANALYZE TABLE part? In most situations we cannot 
effectively use what Hive produces.


Line 48:   By default, when a table involved in the join query does not 
have statistics,
Accuracy could be improved. What if both tables do not have stats? Clarify that 
one table is going to be broadcast. Might even be worth explicitly listing what 
happens if one table has stats and the other doesn't (the one without stats 
will be broadcast)


Line 58:   might be missing statistics due to the overhead involved in 
calculating them,
I wouldn't suppose a particular reason for not having stats.


Line 61:   of a table involved in a join query and only transmits a portion 
of the table
Not very accurate, both tables are transferred across the network. Not sure if 
we need to explain the differences between broadcast+shuffle here, maybe 
provide a link to their explanation/definition?


Line 67:   recommended when setting up and deploying new clusters. This 
setting is
We should mention why we recommend this. SHUFFLE is generally a safer option 
because the join build will be less prone to spilling and/or OOM.


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5483: Automatically disable codegen for small queries

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-5483: Automatically disable codegen for small queries
..

IMPALA-5483: Automatically disable codegen for small queries

This is similar to the single-node execution optimisation, but applies
to slightly larger queries that should run in a distributed manner but
won't benefit from codegen.

This adds a new query option disable_codegen_rows_threshold that
defaults to 50,000. If fewer than this number of rows are processed
by a plan node per impalad, the cost of codegen almost certainly
outweighs the benefit.

Using rows processed as a threshold is justified by a simple
model that assumes the cost of codegen and execution per row for
the same operation are proportional. E.g. if x is the complexity
of the operation, n is the number of rows processed, C is a
constant factor giving the cost of codegen and Ec/Ei are constant
factor giving the cost of codegen'd and interpreted execution and
d, then the cost of the codegen'd operator is C * x + Ec * x * n
and the cost of the interpreted operator is Ei * x * n. Rearranging
means that interpretation is cheaper if n < C / (Ei - Ec), i.e. that
(at least with the simplified model) it makes sense to choose
interpretation or codegen based on a constant threshold. The
model also implies that it is somewhat safer to choose codegen
because the additional cost of codegen is O(1) but the additional
cost of interpretation is O(n).

I ran some experiments with TPC-H Q1, varying the input table size, to
determine what the cut-over point where codegen was beneficial was.
The cutover was around 150k rows per node for both text and parquet.
At 50k rows per node disabling codegen was very beneficial - around
0.12s versus 0.24s.  To be somewhat conservative I set the default
threshold to 50k rows. On more complex queries, e.g. TPC-H Q10, the
cutover tends to be higher because there are plan nodes that process
many fewer than the max rows.

Fix a couple of minor issues in the frontend - the numNodes_
calculation could return 0 for Kudu, and the single node optimization
didn't handle the case where for a scan node with conjuncts, a limit
and missing stats correctly (it considered the estimate still valid.)

Testing:
Updated e2e tests that set disable_codegen to set
disable_codegen_rows_threshold to 0, so that those tests run both
with and without codegen still.

Added an e2e test to make sure that the optimisation is applied in
the backend.

Added planner tests for various cases where codegen should and shouldn't
be disabled.

Perf:
Added a targeted perf test for a join+agg over a small input, which
benefits from this change.

Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
A testdata/workloads/functional-query/queries/QueryTest/codegen.test
A testdata/workloads/targeted-perf/queries/primitive_small_join_1.test
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
A tests/query_test/test_codegen.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_udfs.py
20 files changed, 455 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/7153/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I273bcee58641f5b97de52c0b2caab043c914b32e
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 2:

(1 comment)

Reworded to clarify that size increase is for the topic size, not the metadata 
itself.

http://gerrit.cloudera.org:8080/#/c/7236/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS2, Line 572: Impala catalog metadata (table files
 :   and data blocks)
> This is not correct. IMPALA-4029 causes an increase in catalog update topic
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..

IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

Change-Id: I65495748984268763b2690c193488a338409e13e
---
M docs/topics/impala_known_issues.xml
1 file changed, 25 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 6:

- Updated the commit message with benchmark numbers. 

- Also, looks like CTAS analysis is likely a non-issue (also discussed with 
Alex offline) since the "create" part of CTAS results in creation of base 
directory for table (by HMS) and we always inherit correct ACLs of parent and 
the case that the table directory doesn't exist is not possible. 

- Ran the following test to confirm that.

➜  Impala git:(i5431) ✗ hadoop fs -setfacl -m  user:bharath:rw- /tmp/foo
➜  Impala git:(i5431) ✗ hadoop fs -getfacl /tmp/foo
# file: /tmp/foo
# owner: bharath
# group: supergroup
user::rwx
user:bharath:rw-
group::r-x
mask::rwx
other::r-x

[localhost:21000] > create table bar location '/tmp/foo/bar' as select * from 
tpch_parquet.nation;
Query: create table bar location '/tmp/foo/bar' as select * from 
tpch_parquet.nation
Query submitted at: 2017-06-26 15:55:28 (Coordinator: http://optimus:25000)
Query progress can be monitored at: 
http://optimus:25000/query_plan?query_id=8d4f60928bebcc9c:f7e2c92f
++
| summary|
++
| Inserted 25 row(s) |
++

[localhost:21000] > select count(*) from bar;
Query: select count(*) from bar
Query submitted at: 2017-06-26 15:55:44 (Coordinator: http://optimus:25000)
Query progress can be monitored at: 
http://optimus:25000/query_plan?query_id=8e44ede23419d848:afac5116
+--+
| count(*) |
+--+
| 25   |
+--+

➜  Impala git:(i5431) ✗ hadoop fs -getfacl /tmp/foo/bar
# file: /tmp/foo/bar
# owner: bharath
# group: supergroup
user::rwx
user:bharath:rw-
group::rwx
mask::rwx
other::r-x

-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 1:

This gerrit is a "stake in the ground" with basic details about the query 
option. If there are other places where the option could be mentioned (under 
joins, compute stats, etc.), I'll handle those in a separate gerrit.

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 6: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS5, Line 302: Reference numUnknownDiskIds = new 
Reference(Long.valueOf(0));
nit: move it below L305. If we're going to return there is no need to get a new 
Reference.


PS5, Line 727: new ArrayList()
newFileDescs?


http://gerrit.cloudera.org:8080/#/c/7095/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 498: } catch (FileNotFoundException e) {
I am thinking maybe add a warn in the log that the path doesn't exist.


-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3643/IMPALA-5344: Fix FE tests on Java 8
..


Patch Set 7:

(3 comments)

Let's move forward with this change.

http://gerrit.cloudera.org:8080/#/c/7073/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS7, Line 206: p
Why is this change needed?


http://gerrit.cloudera.org:8080/#/c/7073/7/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

Line 123: this.label_.equals(vertex.label_);
Remove this to make compareTo() consistent with equals().


Line 173: return ImmutableSortedSet.copyOf(sources_);
Why not return an ordered list? Should be smaller, faster, and generate fewer 
useless intermediate objects.


-- 
To view, visit http://gerrit.cloudera.org:8080/7073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad8e1dccec3a51293a109c420bd2b88b9d1e0625
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7300

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7095

to look at the new patch set (#6).

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..

IMPALA-5431: Remove redundant path exists checks during table load

There are multiple places that do an exists() check on a path and then
perform some subsequent action on it. This pattern results in two
RPCs to the NN (one for the exists() check and one for the subsequent
action). We can avoid the exists() check in these cases since most HDFS
methods on paths throw a FileNotFoundException if the path does not
exist. This can save an RPC to NN and improve the metadata loading time.

Testing: Enough tests already cover this code path. This patch
passed core and exhaustive tests.

Metadata benchmark shows decent increase in perf numbers, for ex:

100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV -20.51%
80-PARTITIONS-250K-FILES-S3-03-RECOVER -20.58%
80-PARTITIONS-250K-FILES-11-DROP-PARTITION -22.13%
80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION -22.38%
80-PARTITIONS-250K-FILES-S3-12-DROP -23.69%
100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION -23.91%
100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTITION -26.04%
100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH -26.38%
80-PARTITIONS-250K-FILES-S3-02-CREATE -36.47%
100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS -58.72%
80-PARTITIONS-250K-FILES-S3-01-DROP -95.33%
80-PARTITIONS-250K-FILES-01-DROP -95.93%

Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 51 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/790/

-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


Patch Set 2: Code-Review+2

Rebase and keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5571: Fix authorization tests

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7299

Change subject: IMPALA-5571: Fix authorization tests
..

IMPALA-5571: Fix authorization tests

Issue:
With the fix for IMPALA-5549 the catalog cache of an impalad may end up
with stale authorization metadata in the presence of GRANT/REVOKE
statements that modify multiple privileges.

Fix:
Revert the behavior of GRANT/REVOKE privilege DDL statements.

Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
1 file changed, 6 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/7299/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Icad893a10fdeb7a721264e69413014603dec6339
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> Yes, I wasn't suggesting that we should convert those Allocate() to TryAllo
Completely agree


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12: Code-Review+2

carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/789/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101: results_ = nullptr;
> I don't feel strongly. I think the idea was to move these status checks mor
It doesn't seem strictly necessary for this change. Feel free to do it later.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> I think we should do the work of removing Allocate() calls at some point bu
Yes, I wasn't suggesting that we should convert those Allocate() to 
TryAllocate() in this change. That's just something we can do to get rid of 
CheckQueryState() altogether.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 12:

Removed the preconditions check - it was too strict for the case when the hash 
join was in a subplan.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6963

to look at the new patch set (#12).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/common/RuntimeEnv.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
10 files changed, 984 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 11: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/786/

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101: results_ = nullptr;
> At first glance this looks reasonable to put here, but is this necessary fo
I don't feel strongly. I think the idea was to move these status checks more 
closely to the expr evaluations that could cause errors to be hit.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the
I think we should do the work of removing Allocate() calls at some point but I 
think we should keep fix as targeted as possible. That might be an argument to 
defer doing the rename as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/788/

-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 4: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7295/2/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 1031:   // This must be the first column with var len data for the 
tuple. Var len data
> first column -> first slot
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Tim Armstrong (Code Review)
Hello Thomas Tauber-Marshall, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7295

to look at the new patch set (#3).

Change subject: IMPALA-5554: sorter DCHECK on null column
..

IMPALA-5554: sorter DCHECK on null column

The bug was in the DCHECK. The DCHECK is intended to make sure that a
tuple's string data didn't get split across blocks. The logic assumed
that if the second-or-later string column was in the next-block, that
the strings were split between blocks. However, that assumption is
invalid if there are NULL strings, which do not belong in any block.

The fix for the DCHECK (which is still useful) is to count the number
of non-NULL strings and make sure that no non-NULL strings were split
between blocks.

Testing:
Added a test that reproduces the crash.

Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
---
M be/src/runtime/sorter.cc
M 
testdata/workloads/functional-query/queries/QueryTest/single-node-large-sorts.test
2 files changed, 39 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7295/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4687: Get Impala working against HBase 2.0
..


Patch Set 2: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

PS1, Line 197: setStartRow
> It turns out that setStartRow/setStopRow are newly deprecated. The new meth
Ok, thanks. Mind adding a TODO with jira reference?


PS1, Line 403: 
 : Status HBaseTableScanner::HandleResultS
> We need the actual jthrowable so that we can check whether it is an instanc
Yea, I missed that part, thanks


http://gerrit.cloudera.org:8080/#/c/7277/2/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

PS2, Line 411: error
nit: Don't think error is the right term here. GetJniExceptionMsg() can just 
return Status:OK() as well.


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.h
File be/src/exec/hbase-table-scanner.h:

PS1, Line 65: HBASE-17809.
: ///If ScannerTimeout
nit: Same comment as in the .cc file. We don't always throw an error, we can 
just pass if there is no other exception. May be good to clear it up


-- 
To view, visit http://gerrit.cloudera.org:8080/7277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101: RETURN_IF_ERROR(state->GetQueryStatus());
At first glance this looks reasonable to put here, but is this necessary for 
this change? Looks like the caller is FIS::ExecInternal and the error status 
gets propagated. I'm just asking because we'll start seeing errors sooner, 
which seems like a good thing but may result in some different runtime behavior.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> I wonder if most of the remaining callers can be converted to CheckQuerySta
I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the 
other callsites to CheckQueryStatus(), that'd clearly be ideal, but that seems 
like a bigger change to move Allocate -> TryAllocate. However, maybe we can 
split CheckQueryState() into a fn that just updates the status_ on mem limit 
exceeded, then the places where we know the memory may have been over consumed 
we call that before calling CheckAsyncError()? I guess 
CheckMemLimitOrAsyncError() is at least more clear for now. I'm OK with fixing 
this more localized issue now.

It's funny, CheckQueryStatus() clearly suffers from the QueryMaintenance() 
problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 4:

(7 comments)

As discussed, I'm checking the following,

- Effect on CTAS when the path doesn't exist.
- Running metadata benchmarks to see if there is some perf gain.

Meanwhile, posting the rebased patch that addresses the comments.

http://gerrit.cloudera.org:8080/#/c/7095/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 267:   // No need to load blocks for empty partitions list.
> Right, but saving an RPC seems worth it.
Done (in PS4).


Line 725: }
> remove while you are here
Done


http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 266:   FileSystem fs = dirPath.getFileSystem(CONF);
> Move this closer to where it is used. getFileSystem() tends to be expensive
Done


Line 305:   if (fileStatusIter == null) {
> I think you can make this a single line and remove the comment.
Done


Line 371: if (fileStatusIter == null) {
> I think you can make this a single line and remove the comment.
Done


Line 1709: if (statuses == null) {
> I think you can make this a single line and remove the comment.
Done


http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 491:* the path does not exist. This helps simplifies the caller code 
in cases where
> This helps simplify the caller code...
Ouch, done.


-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Bharath Vissapragada (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7095

to look at the new patch set (#5).

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..

IMPALA-5431: Remove redundant path exists checks during table load

There are multiple places that do an exists() check on a path and then
perform some subsequent action on it. This pattern results in two
RPCs to the NN (one for the exists() check and one for the subsequent
action). We can avoid the exists() check in these cases since most HDFS
methods on paths throw a FileNotFoundException if the path does not
exist. This can save an RPC to NN and improve the metadata loading time.

Testing: Enough tests already cover this code path. This patch
passed core and exhaustive tests.

Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
2 files changed, 51 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/7095/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5554: sorter DCHECK on null column

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5554: sorter DCHECK on null column
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7295/2/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 1031:   // This must be the first column with var len data for the 
tuple. Var len data
first column -> first slot


-- 
To view, visit http://gerrit.cloudera.org:8080/7295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8dee982501008efff5b5abc192cfb5e6544a90
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0

2017-06-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#2).

Change subject: IMPALA-4687: Get Impala working against HBase 2.0
..

IMPALA-4687: Get Impala working against HBase 2.0

This changes Impala code to tolerate the API differences
between HBase 1.0 and HBase 2.0. It also drops
compatibility code for older HBase versions.

Specific changes:
1. Tolerate return value of Scan for Scan.setCaching()
and Scan.setCacheBlocks(). This has no impact on our code.
2. HBase 2.0 eliminates the ScannerTimeoutException. The
case that previously generated the exception will now
recreate the scanner, so it is not necessary for our code
to recreate the scanner. Short-circuit
HandleResultScannerTimeout on HBase 2.0.
3. HBase 2.0 eliminates the Put.add(), which has been
replaced with Put.addColumn(). This API exists in
HBase 1.0, so it is safe to switch this completely.

This was tested by verifying that an HBase 2.0 cluster
starts up.

Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
---
M be/src/exec/hbase-table-scanner.cc
M be/src/exec/hbase-table-scanner.h
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
6 files changed, 89 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/7277/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-4687: Get Impala working against HBase 2.0

2017-06-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4687: Get Impala working against HBase 2.0
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/7277/1//COMMIT_MSG
Commit Message:

PS1, Line 25: This was tested by verifying that an HBase 2.0 cluster
: starts up.
> I think we should run hbase stress test eventually to test the new ScannerT
I think we can run stress tests with existing code. Since HBase stopped 
throwing this exception with HBASE-16266, I think this is already available in 
versions of HBase 1.2, which we already support.


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

PS1, Line 238: 
 : 
> May be retain this part.
Done


PS1, Line 197: setStartRow
> You mentioned on the jira that these are deprecated too (setStartRow/setSto
It turns out that setStartRow/setStopRow are newly deprecated. The new methods 
(with*Row) do not exist in HBase 1.2, so we can't replace it without having 
conditional code to detect the different APIs. I don't think it is worth 
switching yet. I will file a separate JIRA.


PS1, Line 403: jthrowable exc = env->ExceptionOccurred();
 :   if (exc == NULL) return Status::OK();
> Unrelated to your change, but why are we calling this here? Shouldn't GetJn
We need the actual jthrowable so that we can check whether it is an instance of 
ScannerTimeoutException. GetJniExceptionMsg only returns the message.


Line 407:   // necessary). For HBase 2.0, ScannerTimeoutException does not 
exist, so simple
> typo: simply
Done


PS1, Line 407: simple
> simply?
Done


Line 411:   if (scanner_timeout_ex_cl_ == nullptr) return status;
> Is there new exception that signals we need to retry? Is the retry not nece
>From what I understand, HBase does a heartbeat now, and where it previously 
>would throw this exception, it now resets the scanner and retries. HBASE-16266 
>is when this changed.


Line 512:   // Newer versions of HBase re-create the ResultScanner without 
throwing this
> Is "re-create" the right term? I'm not sure how that would work. Do you mea
Reworked this comment.


Line 514:   // code will simply return the error.
> what is "the error"?
Fixed this language.


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-scanner.h
File be/src/exec/hbase-table-scanner.h:

Line 61: /// 1. Scan.setCaching and Scan.setCacheBlocks tolerate the older void 
return value
> use () for function/methods, e.g. Scan.setCaching()
Done


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/exec/hbase-table-writer.h
File be/src/exec/hbase-table-writer.h:

PS1, Line 108: add
> addColumn
Done


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/util/jni-util.cc
File be/src/util/jni-util.cc:

Line 61:   env->GetMethodID(class_ref, method_str, method_signature);
> Reading the documentation for "GetMethodID"
Good point. I haven't looked at the return value, but it does throw 
NoSuchMethodException. Maybe it also returns null. I think either would work, 
but in either case, we would need to clear the exception.


http://gerrit.cloudera.org:8080/#/c/7277/1/be/src/util/jni-util.h
File be/src/util/jni-util.h:

PS1, Line 171: a method
> ..non static method..
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I87610e25c01b3547ec332c6975b61284b6837d27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5431: Remove redundant path exists checks during table load

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5431: Remove redundant path exists checks during table 
load
..


Patch Set 4: Code-Review+1

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7095/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 267:   // No need to load blocks for empty partitions list.
> Okay. I felt this adds a lot of try{} catch{} to the code and hence didn't 
Right, but saving an RPC seems worth it.


Line 725: }
remove while you are here


http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 266:   FileSystem fs = dirPath.getFileSystem(CONF);
Move this closer to where it is used. getFileSystem() tends to be expensive, so 
great if we can avoid it!


Line 305:   if (fileStatusIter == null) {
I think you can make this a single line and remove the comment.


Line 371: if (fileStatusIter == null) {
I think you can make this a single line and remove the comment.


Line 1709: if (statuses == null) {
I think you can make this a single line and remove the comment.


http://gerrit.cloudera.org:8080/#/c/7095/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 491:* the path does not exist. This helps simplifies the caller code 
in cases where
This helps simplify the caller code...


-- 
To view, visit http://gerrit.cloudera.org:8080/7095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10ecf64ea2eda2d0f9299c0aa371933eca22281
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6023/5/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 452:   Decimal16Value buckets = Decimal16Value::FromInt(input_precision, 
input_scale,
It would be good to summarize the calculations and comment on the chosen types, 
e.g. why are we using Dec16 here?

In general, please move definitions closer to where they are used. The buckets 
are only used a lot further down where it might become clearer why it should be 
Dec16.

Adding those comments and doing some cleanup will make reviewing much easier. 
I'll still need to think more carefully about the logic.


-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 330:   if (!status.ok()) {
take out of if/else since it's common to both cases


Line 367:   uint8_t* compressed_buffer_ptr = &compressed_buffer[0];
comptessed_buffer.data()


Line 370:   // Base64 encode the compressed catalog object and store it in the 
topic item.
Is base64-encoding really necessary? Why don't the raw bytes work?


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 200:   /// Serializes 'catalog_object' and compresses it using Snappy 
compression.
remove second "compression"


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1417: Status ImpalaServer::DecompressAndDeserializeCatalogObject(const 
TTopicItem& item,
Why not separate decompression from deserialization? Seem like separate 
concerns.
Then the code would be something like:

buffer = raw buffer
if (compaction is on) buffer = decompress();
DeserializeThriftMsg()
...


http://gerrit.cloudera.org:8080/#/c/7268/2/be/src/util/compress.h
File be/src/util/compress.h:

Line 103:   SnappyCompressor(MemPool* mem_pool = NULL, bool reuse_buffer = 
false);
Why these changes? Shouldn't these always be created using CreateCompressor()?


http://gerrit.cloudera.org:8080/#/c/7268/2/tests/custom_cluster/test_compact_catalog_updates.py
File tests/custom_cluster/test_compact_catalog_updates.py:

Line 45:   "select count(*) from functional.alltypes")
Check result? Maybe also use another query on a parquet table.


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 2: Code-Review+1

(1 comment)

MJ may wanna do another pass.

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> How about we rename them to be less vague. Maybe something like:
I wonder if most of the remaining callers can be converted to 
CheckQueryStatus() given that we do mostly use TryAllocate() instead of 
Allocate(). Most of the remaining callers are DataSinks. May be if we fix 
hdfs-table-writer and friends to use TryAllocate(), things will be easier ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/787/

-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 102: ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_);
> Not your change but should we check for query status here too ? I can imagi
Done


PS1, Line 150:  DCHECK(
> DCHECK_GE()
Done


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
> I think this method is confusing to begin with, which probably led us to mi
How about we rename them to be less vague. Maybe something like:

CheckQueryState() -> CheckMemLimitOrAsyncError()
GetQueryStatus() -> CheckAsyncError()
query_status_ -> async_error_status_


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Tim Armstrong (Code Review)
Hello Impala Public Jenkins, Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7275

to look at the new patch set (#2).

Change subject: IMPALA-5567: race in fragment instance teardown
..

IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 9 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7254

to look at the new patch set (#2).

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..

IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

A recent change moved the initialization of output_tuple_desc_
to the constructor of AggregationNode. This changes the order
in which tuple_pool_ and output_tuple_desc_ are initialized.
The code in AggregationNode::Close() made the assumption that
tuple_pool_ cannot be NULL (although without a DCHECK) if
output_tuple_desc_ is not NULL. This no longer holds in the
new code.

This change makes AggregationNode::Close() checks tuple_pool_
to see if it's NULL before using it.

Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
---
M be/src/exec/aggregation-node.cc
1 file changed, 3 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7254/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7296

Change subject: IMPALA-5579: Fix IndexOutOfBoundsException in GetTables 
metadata request
..

IMPALA-5579: Fix IndexOutOfBoundsException in GetTables metadata request

Issue:
GetTables metadata request may result in an IndexOutOfBoundsException
while accessing table comments fields if one or more tables can't be
retrieved from the catalog cache.

Fix:
Only add tables that can be retrieved from the catalog cache (i.e. no
TableLoadingException is thrown) in the result of a GetTables request.

Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
---
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
2 files changed, 11 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/7296/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic933b9950d58e6f880d4078faf1a69ee7c6adbf1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5031: Set DiskIoMgr::ScanRange::is cancelled before read

2017-06-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5031: Set DiskIoMgr::ScanRange::is_cancelled_ before read
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7294/2/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

PS2, Line 584:  is_cancelled_ = false;
Actually, shouldn't this be done in Reset() instead ?

May also want to fix the constructor to initialize this flag or move the 
initialization of other bool flags here ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I22898ec96ac44c4902f8072f27453cfc58358cae
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7236/2/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS2, Line 572: Impala catalog metadata (table files
 :   and data blocks)
This is not correct. IMPALA-4029 causes an increase in catalog update topic 
size when compact-catalog-topic is used. It doesn't increase catalog metadata 
in general, it actually has the opposite effect.


-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails

2017-06-26 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5551: Fix AggregationNode::Close() when Prepare() fails
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f8cce7cf6e3183d3a5e145c2fcfa50f36c77e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max 
optimization
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max 
optimization
..


IMPALA-3909: [DOCS] Add general info about Parquet min/max optimization

Just putting an initial stake in the ground. If examples,
details of Hive interoperability, or type-by-type details
are needed, I prefer to handle those in followup gerrits.

Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Reviewed-on: http://gerrit.cloudera.org:8080/7068
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M docs/topics/impala_parquet.xml
1 file changed, 19 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2546: Fix handling of invalid timezones

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has abandoned this change.

Change subject: IMPALA-2546: Fix handling of invalid timezones
..


Abandoned

Abandoning for now - Greg pointed out in the JIRA that the current behavior of 
these functions make sense.

-- 
To view, visit http://gerrit.cloudera.org:8080/7216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I587a16bb8e835a6eea02e1f87fd4033686e84d0f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-26 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded a new patch set (#5).

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..

IMPALA-5240: Allow config of number of disk I/O threads per disk type

Currently Impala defaults to 8 threads per flash disk and 1 thread per
rotational disk. This can be overridden with --num_threads_per_disk,
but that sets the thread count independent of disk type.

This would allow control of the number of disk I/O threads spawned
independently for solid state disks using
"--num_io_threads_per_solid_state_disk" and for rotational disks using
"--num_io_threads_per_rotational_disk" as startup parameters.

Testing:
Added backend tests that verify if "num_threads_per_disk",
"num_io_threads_per_solid_state_disk" and
"num_io_threads_per_rotational_disk" are used to spawn threads
appropriately

TODO (Request comments on this additional change):
Additionally made some changes to fix a bug, where impala would crash
if num_disks are set more than the number of logical disks on system and
num_threads_per_disk is not set. This bug also lets the user create more
disk queues than the num of logical disks which would eventually never
be used. As a part of this effort, existing test cases in
disk-io-mgr-test were identified that exploited this bug and hence have
been fixed by explicitly setting num_disks=1 for now because the test
code isn't yet aware of how many disks are actually on the test system.
Test code will be updated to use multiple disks as a part of
IMPALA-5574. Moreover, after this fix, if num_disks is set to more than
the number of logical disks then it will default to max available disks
and log a warning.

Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
---
M be/src/runtime/disk-io-mgr-stress-test.cc
M be/src/runtime/disk-io-mgr-stress.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M be/src/util/thread.cc
M be/src/util/thread.h
M fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
8 files changed, 410 insertions(+), 345 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/7232/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5240: Allow config of number of disk I/O threads per disk type

2017-06-26 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-5240: Allow config of number of disk I/O threads per 
disk type
..


Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/7232/4//COMMIT_MSG
Commit Message:

Line 34: 
> Comment that we are changing the tests to explicitly set num_disks=1 for no
Done


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-stress-test.cc
File be/src/runtime/disk-io-mgr-stress-test.cc:

Line 31: const int NUM_DISKS = 1;
> comment why we are setting this to 1 for now, i.e. that the test code isn't
I have added a TODO here as well as per your other comment in disk-io-mgr-test


http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

Line 1142:   const int num_io_threads_per_rotational_or_ssd = 2;
> const
Done


PS3, Line 1165: 
> weird indentation, just put the operator on the previous line and use 4 spa
Done


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr-test.cc
File be/src/runtime/disk-io-mgr-test.cc:

Line 212:   const int num_disks = 1;
> leave a TODO with the JIRA where we'll change this, same for the other case
Done


http://gerrit.cloudera.org:8080/#/c/7232/4/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

PS4, Line 87: // Rotational disks should have 1 thread per disk to minimize 
seeks.  Non-rotational
: // don't have this penalty and benefit from multiple concurrent 
IO requests.
: static const int THREADS_PER_ROTATIONAL_DISK = 1;
: static const int THREADS_PER_FLASH_DISK = 8;
> I don't think there's any reason to keep these separately and then check if
We need the default value of the flags to be zero in order to figure out if 
they were set or not. That is needed to follow the priority order of :
1. FLAG_num_io_threads_per_rotational_disk
2. num_threads_per_disk
3. THREADS_PER_ROTATIONAL_DISK


PS4, Line 263: 
num_io_threads_per_rotational_disk_(FLAGS_num_io_threads_per_rotational_disk),
 : 
num_io_threads_per_solid_state_disk_(FLAGS_num_io_threads_per_solid_state_disk),
> I think we can configure num_io_threads_per_rotational_disk_ and num_io_thr
Done


Line 278: " disks";
> To be consistent we should also warn if the value is negative
Done


PS4, Line 367: } else if (num_io_threads_per_rotational_disk_ != 0 && 
DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = num_io_threads_per_rotational_disk_;
 : } else if (num_io_threads_per_solid_state_disk_ != 0 && 
!DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = 
num_io_threads_per_solid_state_disk_;
 : } else if (FLAGS_num_threads_per_disk != 0) {
 :   num_threads_per_disk = FLAGS_num_threads_per_disk;
 : } else if (DiskInfo::is_rotational(i)) {
 :   num_threads_per_disk = THREADS_PER_ROTATIONAL_DISK;
 : } else {
 :   num_threads_per_disk = THREADS_PER_FLASH_DISK;
 : }
> After my suggestion in the constructor this becomes greatly simplified.
Done


http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

PS3, Line 631: ///
> comment this is for testing only
Done


PS3, Line 837: o the disk.
> the naming of this should probably be num_io_threads_per_disk_ to be consis
Went ahead with your latter suggestion and removed it altogether


http://gerrit.cloudera.org:8080/#/c/7232/3/be/src/util/thread.cc
File be/src/util/thread.cc:

PS3, Line 335: const
> nit: space after const
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I094aff3747104104fe0465d73dcdbef5d9892f7c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5503: [DOCS] Document how to specify 
coordinator/executor nodes
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 102: ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_);
Not your change but should we check for query status here too ? I can imagine 
expression evaluations above can lead to MemLimitExceeded. Feel free to do it 
later though.


PS1, Line 150:  DCHECK(
DCHECK_GE()


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

2017-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5503: [DOCS] Document how to specify 
coordinator/executor nodes
..


IMPALA-5503: [DOCS] Document how to specify coordinator/executor nodes

Cf. IMPALA-3807 and IMPALA-5147.

Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
Reviewed-on: http://gerrit.cloudera.org:8080/7237
Reviewed-by: John Russell 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_components.xml
M docs/topics/impala_new_features.xml
M docs/topics/impala_scalability.xml
4 files changed, 129 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia20db6af212122b1f87fc6999f8683860beb2bad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7236/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

PS1, Line 572: in memory usage for Impala catalog metadata (table files
 :   and data blocks). By default, the memory overhead is 
about 5-7%. If the
 :   compact_catalog_topic flag is used, 
the memory usage is
 :   more substantial, approximately twice as much as in 
previous versions.
> Increase in memory usage is not the correct description. The effect is incr
Done


PS1, Line 578: High
> I'd say Medium.
Done


PS1, Line 583: A fix is in the pipeline
> IMPALA-5500 is the jira for this fix.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

2017-06-26 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-4029: [DOCS] Include known issue for 'flat buffers' 
change
..

IMPALA-4029: [DOCS] Include known issue for 'flat buffers' change

Change-Id: I65495748984268763b2690c193488a338409e13e
---
M docs/topics/impala_known_issues.xml
1 file changed, 24 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/7236/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65495748984268763b2690c193488a338409e13e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

2017-06-26 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop 
(e.g., the query was
 :   /// cancelled or a mem limit was exceeded). Exec nodes should 
check this periodically so
 :   /// execution doesn't continue if the query terminates 
abnormally. This should not be
 :   /// called after ReleaseResources().
 :   Status CheckQueryState();
I think this method is confusing to begin with, which probably led us to misuse 
it. We have a very overloaded concept of 'query state'.

We don't necessarily have to address this now (e.g. the change in 
plan-fragment-root.cc seems good regardless), but we should probably rethink it.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


  1   2   >