[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 12: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


IMPALA-3909: Populate min/max statistics in Parquet writer

Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Reviewed-on: http://gerrit.cloudera.org:8080/5611
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong 
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
A be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-common.h
M be/src/runtime/string-value.h
M be/src/util/dict-test.cc
M tests/query_test/test_insert_parquet.py
M tests/util/get_parquet_metadata.py
8 files changed, 792 insertions(+), 119 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, but someone else must approve
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4822: Implement dynamic log level changes
..


Patch Set 5:

(9 comments)

Since we're accepting user input it would be great to have unit tests. Checking 
that the correct logging level is actually in effect might be a little tricky. 
I'm more thinking along the lines of checking garbage user input, checking that 
settings are reflected and can be reset, etc. Basic validation. I think you 
should be able to do something similar to webserver-test.cc.

http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/backend-gflag-util.h
File be/src/util/backend-gflag-util.h:

Line 27: /// debugging, so we save the original value here incase we need to 
restore
in case (two words)


http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/logging-support.cc
File be/src/util/logging-support.cc:

Line 77: jclass log4j_logger_class_;
no "_" suffix since not private class members; make these static


Line 196:   if (args.find("get_java_log") != args.end()) {
make these strings constants like:

static const char* GET_JAVA_LOG = "get_java_log";
etc.


Line 212: logging_level == args.end() || logging_level->second == NULL) 
return;
use {} for multi-line ifs


http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/logging-support.h
File be/src/util/logging-support.h:

Line 25: namespace rapidjson {
Ok to use the include if you prefer that solution. Either wfm.


Line 43: /// the Java log4j log messages to be forwarded to Glog.
update comment to reflect that it loads JNI classes/methods to support dynamic 
log level changes


http://gerrit.cloudera.org:8080/#/c/5792/5/common/thrift/Logging.thrift
File common/thrift/Logging.thrift:

Line 34: // Helper structs for GetLogLevel(), SetLogLevel() and ResetLogLevel()
update


http://gerrit.cloudera.org:8080/#/c/5792/5/www/log_level.tmpl
File www/log_level.tmpl:

Line 27:   Log: 
Pretty sure org.apache.foo won't be clear to users. I suggest using a full, 
valid class name like org.apache.impala.analysis.Analyzer


Line 41:   Log: 
use a real class name


-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 31:   public void VerifyNdvBasic(String expr, long expectedNdv)
> Just curious: do you mean change it to verifyNdvBasic()? Our test validatio
yeah, i think i messed that up.


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 12: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 31:   public void VerifyNdvBasic(String expr, long expectedNdv)
> follow java naming convention
Just curious: do you mean change it to verifyNdvBasic()? Our test validation 
functions usually start uppercase like AnalyzesOk(), ParsesOk(), RewritesOk() 
etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning
..


Patch Set 3:

(3 comments)

Thanks for doing this, Bharath! Much better. Just a few more requests as final 
polish.

http://gerrit.cloudera.org:8080/#/c/5828/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 62
I think it's safer to leave this, but mention in the description that this 
option is deprecated. Also add a TODO to get rid of this.


http://gerrit.cloudera.org:8080/#/c/5828/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 343: if (block.getDiskId(i) == -1) ++numScanRangesNoDiskIds_;
scan ranges is a pretty low granularity, can you also roll this up to the file 
and partition level?

I imagine the explain output looking like this:
missing disk ids: x/y scan ranges, x/y files, x/y partitions

also seems useful to LOG.trace() the file paths (not just the file name) that 
are missing disk ids, so we can get them if needed. Looking forward to your 
logging change also :)


Line 589:   if (numScanRangesNoDiskIds_ > 0) {
let's only add this for >= EXTENDED

it will show up in query profiles, but not in regular explain plans

I think we should add a warning at the top of the explain plan listing the 
tables that have missing disk ids (like we do for missing stats)


-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning
..

IMPALA-1427: Improvements to "Unknown disk-ID" warning

This commit,
- Removes the runtime unknown disk ID reporting and instead moves
  it to the explain plan as a counter that prints the number of
  scan ranges missing disk IDs in the corresponding HDFS scan nodes.

- Removes reference to enabling dfs block metadata configuration,
  since it doesn't apply anymore.

- Removes VolumeId terminology from the runtime profile.

Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
4 files changed, 84 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5828/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 97:   ("HDFS_SCAN_NODE_UNKNOWN_DISK", 26, "Encountered unknown disk id for 
table: '$0'."
Forgot to git-add changes for this file. Uploading a new PS.


-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1430: enable codegen for native UDAs
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5161/13/be/src/exprs/scalar-fn-call.h
File be/src/exprs/scalar-fn-call.h:

Line 53:   /// Loads a native or IR UDF or UDA function 'fn' with symbol 
'symbol' from the
> the term 'scalar function' means 'not an aggregate function'. i don't under
I'm fine with moving it to a different class, I mainly wanted to leave as-is 
initially so that reviewer could review the diff of the function implementation.


-- 
To view, visit http://gerrit.cloudera.org:8080/5161
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs

2017-02-01 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5863

Change subject: IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs
..

IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs

Remove some stale links, including old URLs and mentions of
CDH 4-era material.

Modify a couple of links to upstream Apache docs to point to
the real Apache HDFS site, not the internal Cloudera archived
copy of the doc pages.

Change-Id: I26a03cc7912dcb398b1ea246e938dc5eaf6df764
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_faq.xml
2 files changed, 7 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/5863/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I26a03cc7912dcb398b1ea246e938dc5eaf6df764
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 31:   public void VerifyNdvBasic(String expr, long expectedNdv)
follow java naming convention


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5828/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-1427: Improvements to "Unknown disk-ID" warning
> IMPALA-1427?
Done


Line 9: This commit,
> This patch seems fine. 
Great suggestion. That makes sense to me. Adding it.


-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning
..

IMPALA-1427: Improvements to "Unknown disk-ID" warning

This commit,
- Removes the runtime unknown disk ID reporting and instead moves
  it to the explain plan as a counter that prints the number of
  scan ranges missing disk IDs in the corresponding HDFS scan nodes.

- Removes reference to enabling dfs block metadata configuration,
  since it doesn't apply anymore.

- Removes VolumeId terminology from the runtime profile.

Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
4 files changed, 11 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs

2017-02-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1430: enable codegen for native UDAs
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5161/13/be/src/exprs/scalar-fn-call.h
File be/src/exprs/scalar-fn-call.h:

Line 53:   /// Loads a native or IR UDF or UDA function 'fn' with symbol 
'symbol' from the
the term 'scalar function' means 'not an aggregate function'. i don't 
understand how adding this functionality fits into the class abstractions.

let's talk tomorrow.


-- 
To view, visit http://gerrit.cloudera.org:8080/5161
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 6:

(5 comments)

I'm pretty happy with this change minus the remaining nits.

http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java:

Line 385: // sum the number of constants with the maximum distinct value 
for the
... sum the number of distinct constants with the maximum NDV for the 
non-constants.


http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 27:  * Tests computeNumDistinctValues estimates for Exprs
computeNumDistinctValues()


Line 44:   public void VerifyNdv(String stmtStr, long expectedNdv)
can we make this VerifyNdvStmt() or something so that we can use VerifyNdv() 
for the common case of writing tests?


Line 93: // functional.tinytable (tiny) does not
move this into the function comment of VerifyNdvTwoTable()


Line 96: VerifyNdvTwoTable("case when a.id = 1 then 'yes' " + 
remove whitespace


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows

2017-02-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
..


Patch Set 9: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5389/9/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS9, Line 482: OutputAllBuild
OutputAllBuild()


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/227/

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
..


Patch Set 9:

I added logging just to double-check that the deep copy flag was present - it 
is. It's probably difficult to detect the bug, since it requires us to force 
the memory to be overwritten. I tried a couple of tricks, like putting the hash 
join on the right side of a nested loop join, but wasn't able to get it to 
produce incorrect results.

-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 12: Code-Review+1

Rebased, replaced NULL with nullptr, removed "// clang-format" control 
statements and TODOs.

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-02-01 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5611

to look at the new patch set (#12).

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..

IMPALA-3909: Populate min/max statistics in Parquet writer

Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
A be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-common.h
M be/src/runtime/string-value.h
M be/src/util/dict-test.cc
M tests/query_test/test_insert_parquet.py
M tests/util/get_parquet_metadata.py
8 files changed, 792 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..


Patch Set 5:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG
Commit Message:

Line 14: Testing:
Would be nice to get some idea of the perf improvement. The JIRA has an 
interesting query. A small microbenchmark would also be useful.


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 102
There was a specific reason why Open() was called here. There is an expectation 
that GetNext() returns quickly after Open() is called. This expectation has to 
do with our client/server interaction and the query state transitions. The 
query goes into the FINISHED state once Open() succeeded on the coordinator 
fragment.

Does test_rows_availability.py succeed?


Line 62:   pass_through_children_ = tnode.union_node.pass_through;
move to initializer list in cosntructor


Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) 
child(child_idx_ - 1)->Close(state);
Needs comment


Line 124:   if (child_idx_ < children_.size() && isPassThrough(child_idx_)) {
High-level comment what is happening here (passthrough).


Line 140: if (child_eos_) {
Add a comment why it's not ok to Close() the child in this passthrough mode 
even if the child is at eos.

In the meterialization case below, we can and do close the child as soon as 
possible.


Line 141:   row_batch->MarkNeedsDeepCopy();
Needs comment


Line 150:   if (child_idx_ < children_.size() ||
Might as well reverse this check and move it up, set eos to true and return OK. 
Easier to see that we're just going to skip the remaining code.


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 36: /// Node that merges the results of its children by materializing their
update comment


Line 79:   std::vector pass_through_children_;
is_child_passthrough_?

The existing variable name makes it sound like it is a list of children that 
are pass through, but there is actually an entry for all children.

Move this member out of the "Members that need to be reset()" section.


Line 100:   inline bool isPassThrough(int idx) {
idx -> child_idx

const method


Line 101: DCHECK(idx < pass_through_children_.size());
DCHECK_LT


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 132:   if (this->type() != other_desc.type()) return false;
can get rid of this->


http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 554:   /// Comparison is done by the contents of the tuple descriptors and 
not the ids.
I'd prefer to preserve the meaning of these existing functions (IsPrefixOf() 
and Equals(). We have several interesting DCHECKs that require the ids (and not 
just the physical layout) to be identical.

If you really need these functions we should give them new names kind of like 
we have in the FE for this check.


http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

Line 433:   // List of booleans that indicates which children can be passed 
through
Remove "List of booleans" since that's redundant


Line 435:   4: required list pass_through
is_child_passthrough (good to keep names consistent)


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 222: 
Would be good to keep the name of this function and the BE equivalent the same.


Line 223:   public boolean hasEqualPhysicalLayout(SlotDescriptor other) {
needs brief comment


Line 224: if (!this.getType().matchesType(other.getType())) return false;
shouldn't the types be equal()?


Line 226: if (this.getByteSize() != other.getByteSize()) return false;
can remove 'this'


http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:

Line 155:   // List of output expressions of the Union node. This should be the 
same as result
List of output expressions produced by the union without the ORDER BY portion 
(if any). Same as resultExprs_ if there is no ORDER BY.


Line 156:   // resultExprs_ if the UnionStmt does not have an Order By. 
Otherwise resultExprs_
Let's avoid referring to specific plan nodes at this stage and instead try to 
describe 'semantically' what these exprs contain.


Line 158:   private List unionNodeResultExprs_ = Lists.newArrayList();
unionResultExprs_


Line 191: for (Expr e: other.unionNodeResultExprs_) 
unionNodeResultExprs_.add(e.clone());
use Expr.cloneList()


Line 501:   for (UnionOperand op: operands_) {
combine with the loop over operands_ in L526


http://ger

[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support

2017-02-01 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4359: qgen: add UPSERT support
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5768/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 32:String stmtStr = "select " + expr + " from functional.alltypes";
> remove tabs and trailing spaces
Done


Line 42:   public Expr CalcsNdvOk(String stmtStr, long expectedNdv)
> odd/not immediately intuitive name
Changed to VerifyNdv


Line 48: // Go into the select list and get the expression
> superfluous comment (it simply states what's apparent from the next line), 
Done


Line 51: 
> use blank lines judiciously, to separate logically separate pieces of code
Done


Line 54: return analyzedExpr;
> why return anything?
No reason. I have changed this to void.


Line 71: CalcsNdvOkBasic("case when id = 1 then 'yes' when id = 2 then 
'maybe' else 'no' end", 3);
> long line (and elsewhere)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#6).

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..

IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

If all the return values of a Case expression have a known number of
distinct values (i.e. they are constant or statistics exist), then
the number of distinct values for the Case can be computed using this
information.

In order for the value from Case to be used at higher levels in the tree,
the implementation of computeNumDistinctValues for Expr needed to change.
Previously, Expr calculated the number of distinct values by finding any
SlotRefs in its tree and taking the maximum of the distinct values from
those SlotRefs. This would ignore the value from CaseExpr. To fix this,
Expr now takes the maximum number of distinct values across all of its
children.

-- explaining this statement shows cardinality = 2
explain select distinct case when id = 1 then 'yes' else 'no' end
from functional.alltypes;

-- explaining this statement shows cardinality = 2
explain select distinct char_length(case when id = 1 then 'yes' else 'no' end)
from functional.alltypes;

-- explaining this statement shows cardinality = 7300
explain select distinct case when id = 1 then 0 else id end
from functional.alltypes;

-- explaining this statement shows cardinality = 737 (date_string_col has lower
-- cardinality than id)
explain select distinct case when id = 1 then 'yes' else date_string_col end
from functional.alltypes;

For cases when the number of distinct values is not known for all the outputs,
this will return -1, indicating that the number of distinct values is not
known. The inputs (whens) are not used for calculating the number of distinct
values.

Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
---
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
3 files changed, 184 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4359: qgen: add UPSERT support
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/statement_generator.py
File tests/comparison/statement_generator.py:

Line 70: if dml_table.primary_keys:
> Add a comment that if the table has primary keys then it's a Kudu table and
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
..


Patch Set 9: Code-Review+1

(1 comment)

I'm happy that the issue I mentioned is fixed. I might play around a bit to see 
if I can repro it.

While looking at the code, I noticed that the limit wasn't correctly applied on 
this code path. However, this is a pre-existing bug - IMPALA-4866 - that 
affects most of the code paths in partitioned-hash-join-node.cc so I'm happy to 
defer fixing that.

http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 677: if (output_unmatched_batch_iter_->AtEnd()) {
> Is there a dcheck we could add to detect the incorrect state?
We could add a DCHECK(!output_unmatched_batch_->needs_deep_copy()) before the 
Reset() but that is trivially true given we just transferred the resources.


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4359: qgen: add UPSERT support
..


Patch Set 2:

(3 comments)

Thanks for the review, Taras. Please see patch set 2.

http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/model_translator.py
File tests/comparison/model_translator.py:

PS1, Line 534: be
> Why 1 here? Are we expecting there to be several "INSERT" keywords in the q
Done


http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/query.py
File tests/comparison/query.py:

Line 705:   # This enum represents possibilities for different types of 
INSERTs. A user of this
> I see that you explained what these mean below, but it's not to perfectly c
Done


PS1, Line 727:  so th
> inserting into a Kudu table
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has uploaded a new patch set (#2).

Change subject: IMPALA-4359: qgen: add UPSERT support
..

IMPALA-4359: qgen: add UPSERT support

UPSERTs are very similar to INSERTs, so the UPSERT support is simply
folded into that of INSERT. We do this by adding another "conflict
action", CONFLICT_ACTION_UPDATE. The object responsible for holding the
conflict_action attribute is now the InsertClause. This is needed here
because the SqlWriter now needs to know the conflict_action both when
writing the InsertClause (Impala) and at the tail end of the
InsertStatement (PostgreSQL). We also add a few properties to the
InsertStatement interface so that the PostgresqlSqlWriter can form the
correct "DO UPDATE" conflic action, in which primary key columns and
updatable columns must be known. More information on that here:

https://www.postgresql.org/docs/9.5/static/sql-insert.html

By default, we will tend to generate 3 UPSERTs for every 1 INSERT.

In addition to adding unit tests to make sure UPSERTs are properly
written, I used discrepancy_searcher.py --profile dmlonly, both with and
without --explain-only, do run tests. I made sure we were generating
syntactically valid UPSERT statements, and that the INSERT/UPSERT ratio
was roughly 1/3 after 100 statements.

Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/model_translator.py
M tests/comparison/query.py
M tests/comparison/query_profile.py
M tests/comparison/statement_generator.py
M tests/comparison/tests/query_object_testdata.py
6 files changed, 296 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5795/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

2017-02-01 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
..


Patch Set 4:

(32 comments)

First set of comments focusing on the public interface.

http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS4, Line 41:  or scratch disk storage
what does this mean? isn't it always backed by BufferPool (which  may spill to 
disk internally)?


PS4, Line 45: the APIs are called from a single thread
this is saying per BufferTupleStream object, right? maybe clarify that


Line 46: ///
it'd be nice to explain the point of this abstraction a bit more explicitly. 
like when we talk about BTS, we talk in terms of iterators and the iteration 
patterns, but this only alludes to that. and the random access pattern. etc.


PS4, Line 49: pinned or unpinned
I think these terms need to be defined relative to the stream as far as what it 
means to a client and why they would change between states.


PS4, Line 59: page size
wouldn't it also be a function of tuple size (i.e. tuples per page)?


PS4, Line 77: pages
pages' buffers.


PS4, Line 80: location in memory
or you could say "buffer".


PS4, Line 110: in pages_
in the stream


PS4, Line 111: in pages_
same


PS4, Line 114: (if read_write is
 : ///   true, then two pages need to be in memory).
rather than saying this in parenthesis, i think we should pull it up to say 
that there can be both a read and write iterator simultaneously.


PS4, Line 117: caller's reservation is insufficient
does it first try to increase the reservation at this point?


Line 124: /// returned so far from the stream may be freed on the next call to 
GetNext().
do we have a jira we can reference as a TODO here to simplify this?


Line 126: /// Manual construction of rows with AllocateRow():
not for now, but is it possible to unify things so we only have one way to 
write into the stream?

In the mean time, how does this mode relate to the "write" mode above? is it a 
sub-mode for pinned?


Line 138: /// this array as new rows are inserted in the page. If we do so, 
then there will be
is that still something we want to do?


PS4, Line 146: id
what is 'id'? is this index?


PS4, Line 147: With 8MB pages that is 512GB per stream.
where does this limit come into play?  is this something that will cause 
trouble with going to 2MB pages?


PS4, Line 186: page_len: the size of pages to use in the stream
what does that mean for large rows?


PS4, Line 198: of
off


PS4, Line 203: AddRow
is it also used before AllocateRow()?


Line 207:   /// if an error status is returned.
broken line break


Line 212:   /// sets 'status' to an error if an error occurred.
hard to parse that sentence.  but, can we simplify this dance now that we have 
real reservations?
also, does this operation try to increase reservation before failing to append?
is this guaranteed to succeed (other than runtime errors) for unpinned streams 
(maybe after you add support for large rows)?


Line 223:   /// tuples.
same questions. let's think of this interface can be simplified now.


Line 228:   /// been allocated with the stream's row desc.
is the row valid only as long as the stream remains pinned?


PS4, Line 232: begin reading
before the first GetNext()?


Line 236:   /// false if the page could not be pinned and no error was 
encountered.
can we explain that in terms of reservations? and like questions above, does 
this got_buffer==false interface still make sense with reservations?


PS4, Line 240: enough memory
what does that mean in terms of reservations?  and does it still make sense 
with reservations?


Line 253:   };
this seems like a wrinkle to the class comment. can it be succinctly described 
as part of the iterator discussion?


PS4, Line 259: must be copied out by the caller
this seems to contradict the class comment about "needs_deep_copy"


PS4, Line 267: *got_rows is false if the stream could not be pinne
does that still make sense with reservations?  if so, it should be expressed in 
terms of reservation.


PS4, Line 271: date
data?


PS4, Line 271: buffers containing page
are there buffers that don't contain page data?


PS4, Line 286: Returns the byte size of the stream that is currently pinned in 
memory.
Is this trying to say:
Returns the number of bytes currently pinned in memory by the stream?


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..


IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported.

This test is failing on distros that don't support Kudu, but it
shouldn't even be run.

Tested by setting KUDU_IS_SUPPORTED to false, and then trying to
run the test, confirming that it gets skipped. When the env var
KUDU_IS_SUPPORTED is true, the test runs.

Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Reviewed-on: http://gerrit.cloudera.org:8080/5854
Reviewed-by: Michael Brown 
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M tests/shell/test_shell_commandline.py
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Brown: Looks good to me, but someone else must approve
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#5).

Change subject: IMPALA-4822: Implement dynamic log level changes
..

IMPALA-4822: Implement dynamic log level changes

Very often we have to change the logging levels
of Impalads and Catalog server for debugging purposes.
Currently, there is no way to do that without a restart
of the daemons, which is not a viable option in production
deployments.

This patch addresses this supportability gap by exposing
the ability to set dynamic logging levels using a simple
web endpoint on Impalad and Catalog web pages.

This includes setting VLOG levels (equivalent to --v flag)
as well as setting log4j levels on the Frontend and the
Catalog JVMs.

Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
---
M be/src/catalog/catalog-server.cc
M be/src/service/impala-http-handler.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/backend-gflag-util.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/logging-support.h
M common/thrift/Logging.thrift
M fe/src/main/java/org/apache/impala/util/GlogAppender.java
A www/log_level.tmpl
11 files changed, 343 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5768/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

Line 32:String stmtStr = "select " + expr + " from functional.alltypes";
remove tabs and trailing spaces


Line 42:   public Expr CalcsNdvOk(String stmtStr, long expectedNdv)
odd/not immediately intuitive name


Line 48: // Go into the select list and get the expression
superfluous comment (it simply states what's apparent from the next line), 
remove


Line 51: 
use blank lines judiciously, to separate logically separate pieces of code


Line 54: return analyzedExpr;
why return anything?


Line 71: CalcsNdvOkBasic("case when id = 1 then 'yes' when id = 2 then 
'maybe' else 'no' end", 3);
long line (and elsewhere)


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4822: Implement dynamic log level changes
..


Patch Set 1:

(2 comments)

PS4 implements dynamic logging changes to the backend (equivalent to setting 
--v on the fly) and also the "reset" functionality for both log4j and the 
backend. That resets the logging state as per the user defined values at the 
startup.

Also, I'm looking for some suggestions on cosmetic changes to the log_level web 
end point. As of now it works, but looks pretty basic (like it is designed by 
someone who just learnt HTML :D). So that needs to be improved as well.

http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc
File be/src/util/logging-support.cc:

Line 26: #include "rpc/jni-thrift-util.h"
> names.h basically has a bunch of common "using" directives, so is more appr
Thanks for explaining.


http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift
File common/thrift/Logging.thrift:

Line 35: struct TGetLogLevelParams {
> Agree that there is additional bookeeping. I'm generally of the opinion tha
I agree with your point that having a visual feedback is much more easy to the 
end user. For now I've implemented the "reset" functionality. Still need to 
look into log4j if there is a way to get all the overrides.


-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes

2017-02-01 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new patch set (#4).

Change subject: IMPALA-4822: Implement dynamic log level changes
..

IMPALA-4822: Implement dynamic log level changes

Very often we have to change the logging levels
of Impalads and Catalog server for debugging purposes.
Currently, there is no way to do that without a restart
of the daemons, which is not a viable option in production
deployments.

This patch addresses this supportability gap by exposing
the ability to set dynamic logging levels using a simple
web endpoint on Impalad and Catalog web pages.

This includes setting VLOG levels (equivalent to --v flag)
as well as setting log4j levels on the Frontend and the
Catalog JVMs.

Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
---
M be/src/catalog/catalog-server.cc
M be/src/service/impala-http-handler.cc
M be/src/util/backend-gflag-util.cc
M be/src/util/backend-gflag-util.h
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/logging-support.h
M common/thrift/Logging.thrift
M fe/src/main/java/org/apache/impala/util/GlogAppender.java
A www/log_level.tmpl
11 files changed, 345 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW"
..


IMPALA-4829: Change default Kudu read behavior for "RYW"

Currently the default Kudu read mode is set to "READ_LATEST",
which essentially provides no guarantees on reading except
that any read issued will read the latest value that the
target replica happens to have. This is not necessarily a
time after a previous write operation in the same session.
By changing the read mode to the misleadingly named
"READ_AT_SNAPSHOT", we can ensure that Kudu reads will all
be at times at least or greater than the latest "observed"
time (which Impala already sets on the client). Note that
this does not mean all reads are performed at the same
timestamp (i.e. a snapshot read) because that requires
setting a snapshot timestamp, but doing this will require
more work in the future in both Impala and (mostly) Kudu.
The Kudu team calls this "Read Your Writes".

This means that, after this change, values written within a
session will always be visible to subsequent reads. Before
this change, this was usually the case but not guaranteed.

Testing: Private test run, running an exhaustive job now.
This is otherwise difficult to validate in new tests. This
has plenty of time to bake for 2.9 in case we discover
performance or functional issues.

Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
Reviewed-on: http://gerrit.cloudera.org:8080/5802
Reviewed-by: Thomas Tauber-Marshall 
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins
---
M be/src/exec/kudu-scanner.cc
1 file changed, 7 insertions(+), 7 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW"
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4643: [DOCS] Genericize HBase topic, esp links

2017-02-01 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Genericize HBase topic, esp links
..


Patch Set 1:

Forgot to mention in the commit message: I also removed some Cloudera-specific 
wording and directory names that were shown in the banner messages of hive and 
impala-shell commands in examples.

-- 
To view, visit http://gerrit.cloudera.org:8080/5858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:

Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of 
the children
I think this may be something to do with count(*) being non-nullable. Maybe the 
expr in the union node has a nullable slot in that place?

It would be good to understand the root cause so we're sure we fixed the 
underlying bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4643: [DOCS] Genericize HBase topic, esp links

2017-02-01 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5858

Change subject: IMPALA-4643: [DOCS] Genericize HBase topic, esp links
..

IMPALA-4643: [DOCS] Genericize HBase topic, esp links

The Impala + HBase page included a little Cloudera-specific wording,
i.e. "Cloudera recommends...".

The main Cloudera-specific details were links to HBase documentation
hosted on cloudera.com. I made those links use the keydef/keyref
indirection mechanism and edited the URLs to point to the upstream
Apache HBase docs.

Centralizing these links also makes the Impala + HBase source cleaner,
because the link text doesn't need to be inside an ...
pair on the page itself. The link ends up being just
"See ".

After this change, there's no more 'Cloudera' visible in the text of
the relevant HTML page, including the 'view source' view. (Previously,
the cloudera.com links would have shown up but only findable via
'view source'.

Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_hbase.xml
2 files changed, 15 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/5858/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..


Patch Set 4:

(15 comments)

I think this is looking pretty good. It turns out to hit a lot of interesting 
corner cases in the backend but I think we just need to make sure we've got 
them covered and tests for them all.

http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 122:   if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) 
child(child_idx_ - 1)->Close(state);
Can you add a comment that this only applies to passthrough? E.g. "The previous 
child may have been left open if passthrough was enabled for it". Otherwise 
it's hard to figure out how it fits in with the rest of it.


Line 133:   row_batch->set_num_rows(limit_ - num_rows_returned_);
There's a corner case that breaks this calculation. The problem is that 
'row_batch' may be non-empty when GetNext() is called. E.g. the subplan node 
does this. I think we just need to save the value of num_rows() before calling 
GetNext() and adjust the calculation accordingly.

I think we're missing a test case where we have a union node with a limit under 
a subplan.


Line 150:   if (child_idx_ < children_.size() ||
Is this just an optimisation? Might be best to remove it and keep the code 
simpler unless we have data showing it's a bottleneck. If I understand it 
correctly it only helps on the last GetNext() call.

If we keep it we should initialise tuple_buf to NULL so if we mess up it's more 
debuggable.


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 99:   /// Returns true if the child can be passed through.
Nit: "if the child at 'idx'"


http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 541:   /// Return true if the tuple ids of this descriptor match tuple ids 
of other desc.
This comment needs updating to reflect the new behaviour.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

Line 228: if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) 
return false;
Also need to compare the NullIndicatorBit().

There are some subtle differences with how we compute the mem layout for Kudu 
tables can have non-nullable slots, so I think there's an interesting test 
where we union the output of an aggregate function like count() (where slots 
are non-nullable and don't get a null bit) and a Kudu table with a non-nullable 
Kudu column, which gets a null bit regardless.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1536: if (ctx_.hasSubplan()) unionNode.disablePassthrough();
Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget.


http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

Line 68:   // If false, no child nodes would be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer


Line 71:   // Indicates which child nodes can be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer


PS4, Line 176: /*
/**


Line 177:* Compute the children for which rows can be forwarded by the 
Union node without being
It's a little unclear what the input is.

Maybe "Compute whether we can pass through rows from a child whether 
'childExprList' is evaluated over a row with 'childTupleIds'"


Line 183: // Pass through is only done for the simple case where the row 
has a single tuple.
What's the motivation for this? Is it because the union output is always a row 
with a single tuple?


Line 266: analyzer, children_.get(i).getTupleIds(), 
resultExprLists_.get(i)));
I *think* in principle we may need to also check the nullable tuple IDs, since 
the output tuple should be non-nullable but the input could be nullable in 
theory. In practice I don't think it's possible for a row with 1 tuple but it 
would be better to be conservative.

Alex probably has more insight.


Line 299: if (!passThrough_.isEmpty()) {
We probably don't want to print this at explain_level MINIMAL.


http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:

Line 3103: 
Can you add a couple of tests where there is a table scan on one branch of the 
union and a hash join on the other? I didn't see much test coverage of joins in 
unions. The table scan should be passed through and the hash join shouldn't.

The idea is

[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#5).

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..

IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

If all the return values of a Case expression have a known number of
distinct values (i.e. they are constant or statistics exist), then
the number of distinct values for the Case can be computed using this
information.

In order for the value from Case to be used at higher levels in the tree,
the implementation of computeNumDistinctValues for Expr needed to change.
Previously, Expr calculated the number of distinct values by finding any
SlotRefs in its tree and taking the maximum of the distinct values from
those SlotRefs. This would ignore the value from CaseExpr. To fix this,
Expr now takes the maximum number of distinct values across all of its
children.

-- explaining this statement shows cardinality = 2
explain select distinct case when id = 1 then 'yes' else 'no' end
from functional.alltypes;

-- explaining this statement shows cardinality = 2
explain select distinct char_length(case when id = 1 then 'yes' else 'no' end)
from functional.alltypes;

-- explaining this statement shows cardinality = 7300
explain select distinct case when id = 1 then 0 else id end
from functional.alltypes;

-- explaining this statement shows cardinality = 737 (date_string_col has lower
-- cardinality than id)
explain select distinct case when id = 1 then 'yes' else date_string_col end
from functional.alltypes;

For cases when the number of distinct values is not known for all the outputs,
this will return -1, indicating that the number of distinct values is not
known. The inputs (whens) are not used for calculating the number of distinct
values.

Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
---
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
3 files changed, 183 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] Add nested testdata flattener

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Add nested testdata flattener
..


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5787/1//COMMIT_MSG
Commit Message:

Line 22: 
How was this tested?


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/README
File testdata/TableFlattener/README:

Line 3: 
It would be useful to explain also how nested data types (array, struct, map) 
get flattened.


Line 11: 
Mention that testdata/bin/generate-load-nested.sh uses this.


PS1, Line 12: There are various options to specify the type of input file but 
the output is always
: parquet/snappy.
You could mention that the tool tries to use file extension to infer the input 
format.

If you don't want to mention the other options here, you could mention how to 
get help:

  mvn exec:java -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main 
-Dexec.arguments="--help"


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/pom.xml
File testdata/TableFlattener/pom.xml:

Line 1: 
Would Maven permit this whole tree be moved to 
testdata/src/main/java/org/apache/impala/TableFlattener ? As it is, there seem 
to be competing Java source trees about curating test data.


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java
File 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java:

PS1, Line 93: "value"
Why not item for array?


PS1, Line 96: "idx"
Why not pos?


PS1, Line 103: "_values";
This will lead to "foo__values" (with 2 underscores). Is that intended?


PS1, Line 120: "_"
Shouldn't this use getNameSeparator() ?


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java
File 
testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java:

PS1, Line 51: dataset
It would read a bit clearer to call this dstDataSet.


Line 60: Preconditions.checkState(srcSchema.getType() == Type.RECORD);
It would be nice to have a comment on this method describing what it does and 
the parameters.


PS1, Line 65: field.schema()
Why not fieldSchema?


Line 85: // Ensure that the parent schema has an id field so the child can 
reference the
Same with this method: add brief description of method and parameter meanings.


PS1, Line 93: LinkedList fields = new LinkedList<>();
Is it correct to be instantiating fields this way, or should you be using the 
method written above:

  LinkedList fields = Lists.newLinkedList();


-- 
To view, visit http://gerrit.cloudera.org:8080/5787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-02-01 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
22 files changed, 467 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through

2017-02-01 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#5).

Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
..

IMPALA-3586 (Part 1): Implement Union Pass Through

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.

Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
22 files changed, 467 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..

IMPALA-3748: Part 1: Clean up resource estimation in planner

This is in preparation to use this code to compute the minimum
reservation.

The cleanup restructures the code slightly so that it's clearer whether
resource estimates are valid or invalid. It also removes the unused
VCore estimates.

Fixes a couple of bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading

I left one bug unfixed because it is subtle and will be easier to review
in isolation: IMPALA-4862.

There is some remaining questionable behaviour I left untouched:
* It's unclear why unpartitioned fragments are always excluded from
  total memory consumption.
* Many operators do not have estimates or have questionable heuristics.

Testing:
Re-enabled the explain_level tests, which appear to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests and a number of planner test cases for
memory estimates.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M docs/shared/impala_common.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_explain_level.xml
M docs/topics/impala_explain_plan.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_optimize_partition_key_scans.xml
M docs/topics/impala_perf_joins.xml
M docs/topics/impala_perf_stats.xml
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M tests/metadata/test_explain.py
29 files changed, 810 insertions(+), 1,231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend 
JVMs
..


Patch Set 1:

(2 comments)

Responding to comments. Will wait for next PS to continue the review.

http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc
File be/src/util/logging-support.cc:

Line 26: #include "rpc/jni-thrift-util.h"
> Moved. For my understanding, whats special about names.h include?
names.h basically has a bunch of common "using" directives, so is more 
appropriate to put into the "using" section and not the "include" section if 
that makes sense


http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift
File common/thrift/Logging.thrift:

Line 35: struct TGetLogLevelParams {
> Per my understanding log4j doesn't provide any easy way to get that list. S
Agree that there is additional bookeeping. I'm generally of the opinion that if 
the user has a way to change a configuration setting, then there should also be 
a way to inspect the current state of the configuration. This serves two 
purposes: 1. When changing a setting, the user gets immediate visual feedback 
that the setting is indeed in effect. 2. The user can easily see if he forgot 
to undo some changes he made in the past. It can avoid making redundant changes 
(forgot which classes he already increased the log level for).

For now, I'm ok with implementing a button that reset the configuration back to 
the original state.

Happy to discuss further.


-- 
To view, visit http://gerrit.cloudera.org:8080/5792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Release note updates for Impala 2.8

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

Line 65:   
We should definitely also highlight IMPALA-1430 which fixes the slow COMPUTE 
STATS issue with TIMESTAMP. COMPUTE STATS is now codegen'd even if there are 
TIMESTAMP columns. The JIRA is not yet marked as fixed because there is some 
follow-on work, but the COMPUTE STATS issue specifically has been fixed.


Line 130: format TIMESTAMP values, such as the 
result
A naked now() function is a particularly bad example, because now() is already 
constant within a query, so does not benefit from this improvement. A better 
example is something like:

WHERE date_col = to_date(now() - interval 1 day)

the right-hand size expression is somewhat complicated and expensive to 
evaluate for every row. Constant folding makes this significantly faster.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables

2017-02-01 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache 
directives of tables
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5815/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of
describe the fix not the bug:
DROP DATABASE CASCADE removes caching directives


http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1279: uncacheTable(removedDb.getTable(tableName));
It's a little tricky to figure out, but I think we may have to lock these 
tables. For example, there might be a concurrent request for caching a 
table/partition and there could be a concurrent deepCopy() of the msTbl which 
may not be safe with removing entries from the msTbl properties. Might be worth 
looking into at least.


http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java
File fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java:

Line 163:   org.apache.hadoop.hive.metastore.api.Partition part) throws 
ImpalaException {
tab


http://gerrit.cloudera.org:8080/#/c/5815/1/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

Line 212: """The DROP DATABASE CASCADE should properly drop all impacted 
cache directives
Remove "The"


Line 213:IMPALA-2518"""
use IMPALA-2518 as a prefix to comment (like we typically do)


Line 215: # Creates `cachedb` database with some cached tables and 
partitions
Populates the 'cachedb' database...


-- 
To view, visit http://gerrit.cloudera.org:8080/5815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/226/

-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input

2017-02-01 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record 
input
..


Patch Set 2: Code-Review+2

(2 comments)

please let john know we should doc this change

http://gerrit.cloudera.org:8080/#/c/5800/2/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

PS2, Line 26: ,
please add a space


PS2, Line 30: 0,0
so all of these match postgres now?


-- 
To view, visit http://gerrit.cloudera.org:8080/5800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"

2017-02-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW"
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/225/

-- 
To view, visit http://gerrit.cloudera.org:8080/5802
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..


Patch Set 1: Code-Review+2

whoops, thanks David!

-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 4:

looks good, but let's add tests

-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

2017-02-01 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#4).

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..

IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

If all the return values of a Case expression have a known number of
distinct values (i.e. they are constant or statistics exist), then
the number of distinct values for the Case can be computed using this
information.

In order for the value from Case to be used at higher levels in the tree,
the implementation of computeNumDistinctValues for Expr needed to change.
Previously, Expr calculated the number of distinct values by finding any
SlotRefs in its tree and taking the maximum of the distinct values from
those SlotRefs. This would ignore the value from CaseExpr. To fix this,
Expr now takes the maximum number of distinct values across all of its
children.

-- explaining this statement shows cardinality = 2
explain select distinct case when id = 1 then 'yes' else 'no' end
from functional.alltypes;

-- explaining this statement shows cardinality = 2
explain select distinct char_length(case when id = 1 then 'yes' else 'no' end)
from functional.alltypes;

-- explaining this statement shows cardinality = 7300
explain select distinct case when id = 1 then 0 else id end
from functional.alltypes;

-- explaining this statement shows cardinality = 737 (date_string_col has lower
-- cardinality than id)
explain select distinct case when id = 1 then 'yes' else date_string_col end
from functional.alltypes;

For cases when the number of distinct values is not known for all the outputs,
this will return -1, indicating that the number of distinct values is not
known. The inputs (whens) are not used for calculating the number of distinct
values.

Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
---
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
2 files changed, 73 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs

2017-02-01 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4792: Fix number of distinct values for a CASE with 
constant outputs
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5768/3/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java:

Line 393: long numOutputConstants = 0;
> int is fine here, if you overflow that we have other problems.
Done


Line 402:   if ((i - loopStart) % 2 == 1 || (i == children_.size() - 1 && 
hasElseExpr_)) {
> check the negation and continue (to save one indentation level).
Done


Line 409: if (constLiteralSet.add(outputLiteral)) 
++numOutputConstants;
> nice optimization. is there a test for that?
I hand tested it. I was looking through our tests, and it would be easy for me 
to add a SQL to QueryTest/explain-level2.test to test this patch's behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/5768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
..


Patch Set 8: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.

2017-02-01 Thread David Knupp (Code Review)
David Knupp has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5854

Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not 
supported.
..

IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported.

This test is failing on distros that don't support Kudu, but it
shouldn't even be run.

Tested by setting KUDU_IS_SUPPORTED to false, and then trying to
run the test, confirming that it gets skipped. When the env var
KUDU_IS_SUPPORTED is true, the test runs.

Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
---
M tests/shell/test_shell_commandline.py
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5854/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp