[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Tim Armstrong has submitted this change and it was merged. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. IMPALA-3909: Populate min/max statistics in Parquet writer Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Reviewed-on: http://gerrit.cloudera.org:8080/5611 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h A be/src/exec/parquet-column-stats.h M be/src/exec/parquet-common.h M be/src/runtime/string-value.h M be/src/util/dict-test.cc M tests/query_test/test_insert_parquet.py M tests/util/get_parquet_metadata.py 8 files changed, 792 insertions(+), 119 deletions(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Alex Behm has posted comments on this change. Change subject: IMPALA-4822: Implement dynamic log level changes .. Patch Set 5: (9 comments) Since we're accepting user input it would be great to have unit tests. Checking that the correct logging level is actually in effect might be a little tricky. I'm more thinking along the lines of checking garbage user input, checking that settings are reflected and can be reset, etc. Basic validation. I think you should be able to do something similar to webserver-test.cc. http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/backend-gflag-util.h File be/src/util/backend-gflag-util.h: Line 27: /// debugging, so we save the original value here incase we need to restore in case (two words) http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/logging-support.cc File be/src/util/logging-support.cc: Line 77: jclass log4j_logger_class_; no "_" suffix since not private class members; make these static Line 196: if (args.find("get_java_log") != args.end()) { make these strings constants like: static const char* GET_JAVA_LOG = "get_java_log"; etc. Line 212: logging_level == args.end() || logging_level->second == NULL) return; use {} for multi-line ifs http://gerrit.cloudera.org:8080/#/c/5792/5/be/src/util/logging-support.h File be/src/util/logging-support.h: Line 25: namespace rapidjson { Ok to use the include if you prefer that solution. Either wfm. Line 43: /// the Java log4j log messages to be forwarded to Glog. update comment to reflect that it loads JNI classes/methods to support dynamic log level changes http://gerrit.cloudera.org:8080/#/c/5792/5/common/thrift/Logging.thrift File common/thrift/Logging.thrift: Line 34: // Helper structs for GetLogLevel(), SetLogLevel() and ResetLogLevel() update http://gerrit.cloudera.org:8080/#/c/5792/5/www/log_level.tmpl File www/log_level.tmpl: Line 27: Log: Pretty sure org.apache.foo won't be clear to users. I suggest using a full, valid class name like org.apache.impala.analysis.Analyzer Line 41: Log: use a real class name -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 31: public void VerifyNdvBasic(String expr, long expectedNdv) > Just curious: do you mean change it to verifyNdvBasic()? Our test validatio yeah, i think i messed that up. -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Alex Behm has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 31: public void VerifyNdvBasic(String expr, long expectedNdv) > follow java naming convention Just curious: do you mean change it to verifyNdvBasic()? Our test validation functions usually start uppercase like AnalyzesOk(), ParsesOk(), RewritesOk() etc. -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning
Alex Behm has posted comments on this change. Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning .. Patch Set 3: (3 comments) Thanks for doing this, Bharath! Much better. Just a few more requests as final polish. http://gerrit.cloudera.org:8080/#/c/5828/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 62 I think it's safer to leave this, but mention in the description that this option is deprecated. Also add a TODO to get rid of this. http://gerrit.cloudera.org:8080/#/c/5828/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 343: if (block.getDiskId(i) == -1) ++numScanRangesNoDiskIds_; scan ranges is a pretty low granularity, can you also roll this up to the file and partition level? I imagine the explain output looking like this: missing disk ids: x/y scan ranges, x/y files, x/y partitions also seems useful to LOG.trace() the file paths (not just the file name) that are missing disk ids, so we can get them if needed. Looking forward to your logging change also :) Line 589: if (numScanRangesNoDiskIds_ > 0) { let's only add this for >= EXTENDED it will show up in query profiles, but not in regular explain plans I think we should add a warning at the top of the explain plan listing the tables that have missing disk ids (like we do for missing stats) -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning
Bharath Vissapragada has uploaded a new patch set (#3). Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning .. IMPALA-1427: Improvements to "Unknown disk-ID" warning This commit, - Removes the runtime unknown disk ID reporting and instead moves it to the explain plan as a counter that prints the number of scan ranges missing disk IDs in the corresponding HDFS scan nodes. - Removes reference to enabling dfs block metadata configuration, since it doesn't apply anymore. - Removes VolumeId terminology from the runtime profile. Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 4 files changed, 84 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/3 -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5828/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 97: ("HDFS_SCAN_NODE_UNKNOWN_DISK", 26, "Encountered unknown disk id for table: '$0'." Forgot to git-add changes for this file. Uploading a new PS. -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-1430: enable codegen for native UDAs .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/5161/13/be/src/exprs/scalar-fn-call.h File be/src/exprs/scalar-fn-call.h: Line 53: /// Loads a native or IR UDF or UDA function 'fn' with symbol 'symbol' from the > the term 'scalar function' means 'not an aggregate function'. i don't under I'm fine with moving it to a different class, I mainly wanted to leave as-is initially so that reviewer could review the diff of the function implementation. -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/5863 Change subject: IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs .. IMPALA-3406: [DOCS] Remove stale and Cloudera-specific URLs Remove some stale links, including old URLs and mentions of CDH 4-era material. Modify a couple of links to upstream Apache docs to point to the real Apache HDFS site, not the internal Cloudera archived copy of the doc pages. Change-Id: I26a03cc7912dcb398b1ea246e938dc5eaf6df764 --- M docs/impala_keydefs.ditamap M docs/topics/impala_faq.xml 2 files changed, 7 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/5863/1 -- To view, visit http://gerrit.cloudera.org:8080/5863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I26a03cc7912dcb398b1ea246e938dc5eaf6df764 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 31: public void VerifyNdvBasic(String expr, long expectedNdv) follow java naming convention -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5828/1//COMMIT_MSG Commit Message: Line 7: IMPALA-1427: Improvements to "Unknown disk-ID" warning > IMPALA-1427? Done Line 9: This commit, > This patch seems fine. Great suggestion. That makes sense to me. Adding it. -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1427: Improvements to "Unknown disk-ID" warning
Bharath Vissapragada has uploaded a new patch set (#2). Change subject: IMPALA-1427: Improvements to "Unknown disk-ID" warning .. IMPALA-1427: Improvements to "Unknown disk-ID" warning This commit, - Removes the runtime unknown disk ID reporting and instead moves it to the explain plan as a counter that prints the number of scan ranges missing disk IDs in the corresponding HDFS scan nodes. - Removes reference to enabling dfs block metadata configuration, since it doesn't apply anymore. - Removes VolumeId terminology from the runtime profile. Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 4 files changed, 11 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/2 -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-1430: enable codegen for native UDAs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-1430: enable codegen for native UDAs .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/5161/13/be/src/exprs/scalar-fn-call.h File be/src/exprs/scalar-fn-call.h: Line 53: /// Loads a native or IR UDF or UDA function 'fn' with symbol 'symbol' from the the term 'scalar function' means 'not an aggregate function'. i don't understand how adding this functionality fits into the class abstractions. let's talk tomorrow. -- To view, visit http://gerrit.cloudera.org:8080/5161 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1708eaa96eb76fb9bec5eeabf209f81c88eec2f Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Alex Behm has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 6: (5 comments) I'm pretty happy with this change minus the remaining nits. http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java: Line 385: // sum the number of constants with the maximum distinct value for the ... sum the number of distinct constants with the maximum NDV for the non-constants. http://gerrit.cloudera.org:8080/#/c/5768/6/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 27: * Tests computeNumDistinctValues estimates for Exprs computeNumDistinctValues() Line 44: public void VerifyNdv(String stmtStr, long expectedNdv) can we make this VerifyNdvStmt() or something so that we can use VerifyNdv() for the common case of writing tests? Line 93: // functional.tinytable (tiny) does not move this into the function comment of VerifyNdvTwoTable() Line 96: VerifyNdvTwoTable("case when a.id = 1 then 'yes' " + remove whitespace -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Dan Hecht has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows .. Patch Set 9: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5389/9/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS9, Line 482: OutputAllBuild OutputAllBuild() -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 12: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/227/ -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows .. Patch Set 9: I added logging just to double-check that the deep copy flag was present - it is. It's probably difficult to detect the bug, since it requires us to force the memory to be overwritten. I tried a couple of tricks, like putting the hash join on the right side of a nested loop join, but wasn't able to get it to produce incorrect results. -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Lars Volker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 12: Code-Review+1 Rebased, replaced NULL with nullptr, removed "// clang-format" control statements and TODOs. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Hello Marcel Kornacker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5611 to look at the new patch set (#12). Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. IMPALA-3909: Populate min/max statistics in Parquet writer Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h A be/src/exec/parquet-column-stats.h M be/src/exec/parquet-common.h M be/src/runtime/string-value.h M be/src/util/dict-test.cc M tests/query_test/test_insert_parquet.py M tests/util/get_parquet_metadata.py 8 files changed, 792 insertions(+), 119 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/12 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Alex Behm has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. Patch Set 5: (42 comments) http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG Commit Message: Line 14: Testing: Would be nice to get some idea of the perf improvement. The JIRA has an interesting query. A small microbenchmark would also be useful. http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 102 There was a specific reason why Open() was called here. There is an expectation that GetNext() returns quickly after Open() is called. This expectation has to do with our client/server interaction and the query state transitions. The query goes into the FINISHED state once Open() succeeded on the coordinator fragment. Does test_rows_availability.py succeed? Line 62: pass_through_children_ = tnode.union_node.pass_through; move to initializer list in cosntructor Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state); Needs comment Line 124: if (child_idx_ < children_.size() && isPassThrough(child_idx_)) { High-level comment what is happening here (passthrough). Line 140: if (child_eos_) { Add a comment why it's not ok to Close() the child in this passthrough mode even if the child is at eos. In the meterialization case below, we can and do close the child as soon as possible. Line 141: row_batch->MarkNeedsDeepCopy(); Needs comment Line 150: if (child_idx_ < children_.size() || Might as well reverse this check and move it up, set eos to true and return OK. Easier to see that we're just going to skip the remaining code. http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 36: /// Node that merges the results of its children by materializing their update comment Line 79: std::vector pass_through_children_; is_child_passthrough_? The existing variable name makes it sound like it is a list of children that are pass through, but there is actually an entry for all children. Move this member out of the "Members that need to be reset()" section. Line 100: inline bool isPassThrough(int idx) { idx -> child_idx const method Line 101: DCHECK(idx < pass_through_children_.size()); DCHECK_LT http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 132: if (this->type() != other_desc.type()) return false; can get rid of this-> http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 554: /// Comparison is done by the contents of the tuple descriptors and not the ids. I'd prefer to preserve the meaning of these existing functions (IsPrefixOf() and Equals(). We have several interesting DCHECKs that require the ids (and not just the physical layout) to be identical. If you really need these functions we should give them new names kind of like we have in the FE for this check. http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: Line 433: // List of booleans that indicates which children can be passed through Remove "List of booleans" since that's redundant Line 435: 4: required list pass_through is_child_passthrough (good to keep names consistent) http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 222: Would be good to keep the name of this function and the BE equivalent the same. Line 223: public boolean hasEqualPhysicalLayout(SlotDescriptor other) { needs brief comment Line 224: if (!this.getType().matchesType(other.getType())) return false; shouldn't the types be equal()? Line 226: if (this.getByteSize() != other.getByteSize()) return false; can remove 'this' http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java: Line 155: // List of output expressions of the Union node. This should be the same as result List of output expressions produced by the union without the ORDER BY portion (if any). Same as resultExprs_ if there is no ORDER BY. Line 156: // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_ Let's avoid referring to specific plan nodes at this stage and instead try to describe 'semantically' what these exprs contain. Line 158: private List unionNodeResultExprs_ = Lists.newArrayList(); unionResultExprs_ Line 191: for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone()); use Expr.cloneList() Line 501: for (UnionOperand op: operands_) { combine with the loop over operands_ in L526 http://ger
[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support
David Knupp has posted comments on this change. Change subject: IMPALA-4359: qgen: add UPSERT support .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/5768/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 32:String stmtStr = "select " + expr + " from functional.alltypes"; > remove tabs and trailing spaces Done Line 42: public Expr CalcsNdvOk(String stmtStr, long expectedNdv) > odd/not immediately intuitive name Changed to VerifyNdv Line 48: // Go into the select list and get the expression > superfluous comment (it simply states what's apparent from the next line), Done Line 51: > use blank lines judiciously, to separate logically separate pieces of code Done Line 54: return analyzedExpr; > why return anything? No reason. I have changed this to void. Line 71: CalcsNdvOkBasic("case when id = 1 then 'yes' when id = 2 then 'maybe' else 'no' end", 3); > long line (and elsewhere) Done -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Joe McDonnell has uploaded a new patch set (#6). Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. IMPALA-4792: Fix number of distinct values for a CASE with constant outputs If all the return values of a Case expression have a known number of distinct values (i.e. they are constant or statistics exist), then the number of distinct values for the Case can be computed using this information. In order for the value from Case to be used at higher levels in the tree, the implementation of computeNumDistinctValues for Expr needed to change. Previously, Expr calculated the number of distinct values by finding any SlotRefs in its tree and taking the maximum of the distinct values from those SlotRefs. This would ignore the value from CaseExpr. To fix this, Expr now takes the maximum number of distinct values across all of its children. -- explaining this statement shows cardinality = 2 explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; -- explaining this statement shows cardinality = 2 explain select distinct char_length(case when id = 1 then 'yes' else 'no' end) from functional.alltypes; -- explaining this statement shows cardinality = 7300 explain select distinct case when id = 1 then 0 else id end from functional.alltypes; -- explaining this statement shows cardinality = 737 (date_string_col has lower -- cardinality than id) explain select distinct case when id = 1 then 'yes' else date_string_col end from functional.alltypes; For cases when the number of distinct values is not known for all the outputs, this will return -1, indicating that the number of distinct values is not known. The inputs (whens) are not used for calculating the number of distinct values. Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 --- M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java 3 files changed, 184 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/6 -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support
Michael Brown has posted comments on this change. Change subject: IMPALA-4359: qgen: add UPSERT support .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/statement_generator.py File tests/comparison/statement_generator.py: Line 70: if dml_table.primary_keys: > Add a comment that if the table has primary keys then it's a Kudu table and Done -- To view, visit http://gerrit.cloudera.org:8080/5795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows .. Patch Set 9: Code-Review+1 (1 comment) I'm happy that the issue I mentioned is fixed. I might play around a bit to see if I can repro it. While looking at the code, I noticed that the limit wasn't correctly applied on this code path. However, this is a pre-existing bug - IMPALA-4866 - that affects most of the code paths in partitioned-hash-join-node.cc so I'm happy to defer fixing that. http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 677: if (output_unmatched_batch_iter_->AtEnd()) { > Is there a dcheck we could add to detect the incorrect state? We could add a DCHECK(!output_unmatched_batch_->needs_deep_copy()) before the Reset() but that is trivially true given we just transferred the resources. -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support
Michael Brown has posted comments on this change. Change subject: IMPALA-4359: qgen: add UPSERT support .. Patch Set 2: (3 comments) Thanks for the review, Taras. Please see patch set 2. http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/model_translator.py File tests/comparison/model_translator.py: PS1, Line 534: be > Why 1 here? Are we expecting there to be several "INSERT" keywords in the q Done http://gerrit.cloudera.org:8080/#/c/5795/1/tests/comparison/query.py File tests/comparison/query.py: Line 705: # This enum represents possibilities for different types of INSERTs. A user of this > I see that you explained what these mean below, but it's not to perfectly c Done PS1, Line 727: so th > inserting into a Kudu table Done -- To view, visit http://gerrit.cloudera.org:8080/5795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4359: qgen: add UPSERT support
Michael Brown has uploaded a new patch set (#2). Change subject: IMPALA-4359: qgen: add UPSERT support .. IMPALA-4359: qgen: add UPSERT support UPSERTs are very similar to INSERTs, so the UPSERT support is simply folded into that of INSERT. We do this by adding another "conflict action", CONFLICT_ACTION_UPDATE. The object responsible for holding the conflict_action attribute is now the InsertClause. This is needed here because the SqlWriter now needs to know the conflict_action both when writing the InsertClause (Impala) and at the tail end of the InsertStatement (PostgreSQL). We also add a few properties to the InsertStatement interface so that the PostgresqlSqlWriter can form the correct "DO UPDATE" conflic action, in which primary key columns and updatable columns must be known. More information on that here: https://www.postgresql.org/docs/9.5/static/sql-insert.html By default, we will tend to generate 3 UPSERTs for every 1 INSERT. In addition to adding unit tests to make sure UPSERTs are properly written, I used discrepancy_searcher.py --profile dmlonly, both with and without --explain-only, do run tests. I made sure we were generating syntactically valid UPSERT statements, and that the INSERT/UPSERT ratio was roughly 1/3 after 100 statements. Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 --- M tests/comparison/discrepancy_searcher.py M tests/comparison/model_translator.py M tests/comparison/query.py M tests/comparison/query_profile.py M tests/comparison/statement_generator.py M tests/comparison/tests/query_object_testdata.py 6 files changed, 296 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5795/2 -- To view, visit http://gerrit.cloudera.org:8080/5795 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6382f6ab22ba29c117e39a5d90592d3637df4b25 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 4: (32 comments) First set of comments focusing on the public interface. http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS4, Line 41: or scratch disk storage what does this mean? isn't it always backed by BufferPool (which may spill to disk internally)? PS4, Line 45: the APIs are called from a single thread this is saying per BufferTupleStream object, right? maybe clarify that Line 46: /// it'd be nice to explain the point of this abstraction a bit more explicitly. like when we talk about BTS, we talk in terms of iterators and the iteration patterns, but this only alludes to that. and the random access pattern. etc. PS4, Line 49: pinned or unpinned I think these terms need to be defined relative to the stream as far as what it means to a client and why they would change between states. PS4, Line 59: page size wouldn't it also be a function of tuple size (i.e. tuples per page)? PS4, Line 77: pages pages' buffers. PS4, Line 80: location in memory or you could say "buffer". PS4, Line 110: in pages_ in the stream PS4, Line 111: in pages_ same PS4, Line 114: (if read_write is : /// true, then two pages need to be in memory). rather than saying this in parenthesis, i think we should pull it up to say that there can be both a read and write iterator simultaneously. PS4, Line 117: caller's reservation is insufficient does it first try to increase the reservation at this point? Line 124: /// returned so far from the stream may be freed on the next call to GetNext(). do we have a jira we can reference as a TODO here to simplify this? Line 126: /// Manual construction of rows with AllocateRow(): not for now, but is it possible to unify things so we only have one way to write into the stream? In the mean time, how does this mode relate to the "write" mode above? is it a sub-mode for pinned? Line 138: /// this array as new rows are inserted in the page. If we do so, then there will be is that still something we want to do? PS4, Line 146: id what is 'id'? is this index? PS4, Line 147: With 8MB pages that is 512GB per stream. where does this limit come into play? is this something that will cause trouble with going to 2MB pages? PS4, Line 186: page_len: the size of pages to use in the stream what does that mean for large rows? PS4, Line 198: of off PS4, Line 203: AddRow is it also used before AllocateRow()? Line 207: /// if an error status is returned. broken line break Line 212: /// sets 'status' to an error if an error occurred. hard to parse that sentence. but, can we simplify this dance now that we have real reservations? also, does this operation try to increase reservation before failing to append? is this guaranteed to succeed (other than runtime errors) for unpinned streams (maybe after you add support for large rows)? Line 223: /// tuples. same questions. let's think of this interface can be simplified now. Line 228: /// been allocated with the stream's row desc. is the row valid only as long as the stream remains pinned? PS4, Line 232: begin reading before the first GetNext()? Line 236: /// false if the page could not be pinned and no error was encountered. can we explain that in terms of reservations? and like questions above, does this got_buffer==false interface still make sense with reservations? PS4, Line 240: enough memory what does that mean in terms of reservations? and does it still make sense with reservations? Line 253: }; this seems like a wrinkle to the class comment. can it be succinctly described as part of the iterator discussion? PS4, Line 259: must be copied out by the caller this seems to contradict the class comment about "needs_deep_copy" PS4, Line 267: *got_rows is false if the stream could not be pinne does that still make sense with reservations? if so, it should be expressed in terms of reservation. PS4, Line 271: date data? PS4, Line 271: buffers containing page are there buffers that don't contain page data? PS4, Line 286: Returns the byte size of the stream that is currently pinned in memory. Is this trying to say: Returns the number of bytes currently pinned in memory by the stream? -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. This test is failing on distros that don't support Kudu, but it shouldn't even be run. Tested by setting KUDU_IS_SUPPORTED to false, and then trying to run the test, confirming that it gets skipped. When the env var KUDU_IS_SUPPORTED is true, the test runs. Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Reviewed-on: http://gerrit.cloudera.org:8080/5854 Reviewed-by: Michael Brown Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M tests/shell/test_shell_commandline.py 1 file changed, 2 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, but someone else must approve Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Bharath Vissapragada has uploaded a new patch set (#5). Change subject: IMPALA-4822: Implement dynamic log level changes .. IMPALA-4822: Implement dynamic log level changes Very often we have to change the logging levels of Impalads and Catalog server for debugging purposes. Currently, there is no way to do that without a restart of the daemons, which is not a viable option in production deployments. This patch addresses this supportability gap by exposing the ability to set dynamic logging levels using a simple web endpoint on Impalad and Catalog web pages. This includes setting VLOG levels (equivalent to --v flag) as well as setting log4j levels on the Frontend and the Catalog JVMs. Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 --- M be/src/catalog/catalog-server.cc M be/src/service/impala-http-handler.cc M be/src/util/backend-gflag-util.cc M be/src/util/backend-gflag-util.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/logging-support.cc M be/src/util/logging-support.h M common/thrift/Logging.thrift M fe/src/main/java/org/apache/impala/util/GlogAppender.java A www/log_level.tmpl 11 files changed, 343 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/5 -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/5768/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: Line 32:String stmtStr = "select " + expr + " from functional.alltypes"; remove tabs and trailing spaces Line 42: public Expr CalcsNdvOk(String stmtStr, long expectedNdv) odd/not immediately intuitive name Line 48: // Go into the select list and get the expression superfluous comment (it simply states what's apparent from the next line), remove Line 51: use blank lines judiciously, to separate logically separate pieces of code Line 54: return analyzedExpr; why return anything? Line 71: CalcsNdvOkBasic("case when id = 1 then 'yes' when id = 2 then 'maybe' else 'no' end", 3); long line (and elsewhere) -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4822: Implement dynamic log level changes .. Patch Set 1: (2 comments) PS4 implements dynamic logging changes to the backend (equivalent to setting --v on the fly) and also the "reset" functionality for both log4j and the backend. That resets the logging state as per the user defined values at the startup. Also, I'm looking for some suggestions on cosmetic changes to the log_level web end point. As of now it works, but looks pretty basic (like it is designed by someone who just learnt HTML :D). So that needs to be improved as well. http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc File be/src/util/logging-support.cc: Line 26: #include "rpc/jni-thrift-util.h" > names.h basically has a bunch of common "using" directives, so is more appr Thanks for explaining. http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift File common/thrift/Logging.thrift: Line 35: struct TGetLogLevelParams { > Agree that there is additional bookeeping. I'm generally of the opinion tha I agree with your point that having a visual feedback is much more easy to the end user. For now I've implemented the "reset" functionality. Still need to look into log4j if there is a way to get all the overrides. -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4822: Implement dynamic log level changes
Bharath Vissapragada has uploaded a new patch set (#4). Change subject: IMPALA-4822: Implement dynamic log level changes .. IMPALA-4822: Implement dynamic log level changes Very often we have to change the logging levels of Impalads and Catalog server for debugging purposes. Currently, there is no way to do that without a restart of the daemons, which is not a viable option in production deployments. This patch addresses this supportability gap by exposing the ability to set dynamic logging levels using a simple web endpoint on Impalad and Catalog web pages. This includes setting VLOG levels (equivalent to --v flag) as well as setting log4j levels on the Frontend and the Catalog JVMs. Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 --- M be/src/catalog/catalog-server.cc M be/src/service/impala-http-handler.cc M be/src/util/backend-gflag-util.cc M be/src/util/backend-gflag-util.h M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/logging-support.cc M be/src/util/logging-support.h M common/thrift/Logging.thrift M fe/src/main/java/org/apache/impala/util/GlogAppender.java A www/log_level.tmpl 11 files changed, 345 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/5792/4 -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW" .. IMPALA-4829: Change default Kudu read behavior for "RYW" Currently the default Kudu read mode is set to "READ_LATEST", which essentially provides no guarantees on reading except that any read issued will read the latest value that the target replica happens to have. This is not necessarily a time after a previous write operation in the same session. By changing the read mode to the misleadingly named "READ_AT_SNAPSHOT", we can ensure that Kudu reads will all be at times at least or greater than the latest "observed" time (which Impala already sets on the client). Note that this does not mean all reads are performed at the same timestamp (i.e. a snapshot read) because that requires setting a snapshot timestamp, but doing this will require more work in the future in both Impala and (mostly) Kudu. The Kudu team calls this "Read Your Writes". This means that, after this change, values written within a session will always be visible to subsequent reads. Before this change, this was usually the case but not guaranteed. Testing: Private test run, running an exhaustive job now. This is otherwise difficult to validate in new tests. This has plenty of time to bake for 2.9 in case we discover performance or functional issues. Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Reviewed-on: http://gerrit.cloudera.org:8080/5802 Reviewed-by: Thomas Tauber-Marshall Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/kudu-scanner.cc 1 file changed, 7 insertions(+), 7 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW" .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4643: [DOCS] Genericize HBase topic, esp links
John Russell has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Genericize HBase topic, esp links .. Patch Set 1: Forgot to mention in the commit message: I also removed some Cloudera-specific wording and directory names that were shown in the banner messages of hive and impala-shell commands in examples. -- To view, visit http://gerrit.cloudera.org:8080/5858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test File testdata/workloads/functional-query/queries/QueryTest/union.test: Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children I think this may be something to do with count(*) being non-nullable. Maybe the expr in the union node has a nullable slot in that place? It would be good to understand the root cause so we're sure we fixed the underlying bug. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4643: [DOCS] Genericize HBase topic, esp links
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/5858 Change subject: IMPALA-4643: [DOCS] Genericize HBase topic, esp links .. IMPALA-4643: [DOCS] Genericize HBase topic, esp links The Impala + HBase page included a little Cloudera-specific wording, i.e. "Cloudera recommends...". The main Cloudera-specific details were links to HBase documentation hosted on cloudera.com. I made those links use the keydef/keyref indirection mechanism and edited the URLs to point to the upstream Apache HBase docs. Centralizing these links also makes the Impala + HBase source cleaner, because the link text doesn't need to be inside an ... pair on the page itself. The link ends up being just "See ". After this change, there's no more 'Cloudera' visible in the text of the relevant HTML page, including the 'view source' view. (Previously, the cloudera.com links would have shown up but only findable via 'view source'. Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb --- M docs/impala_keydefs.ditamap M docs/topics/impala_hbase.xml 2 files changed, 15 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/5858/1 -- To view, visit http://gerrit.cloudera.org:8080/5858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic36ad02766a0f8945759c698fa1c5760dc12d0cb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. Patch Set 4: (15 comments) I think this is looking pretty good. It turns out to hit a lot of interesting corner cases in the backend but I think we just need to make sure we've got them covered and tests for them all. http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state); Can you add a comment that this only applies to passthrough? E.g. "The previous child may have been left open if passthrough was enabled for it". Otherwise it's hard to figure out how it fits in with the rest of it. Line 133: row_batch->set_num_rows(limit_ - num_rows_returned_); There's a corner case that breaks this calculation. The problem is that 'row_batch' may be non-empty when GetNext() is called. E.g. the subplan node does this. I think we just need to save the value of num_rows() before calling GetNext() and adjust the calculation accordingly. I think we're missing a test case where we have a union node with a limit under a subplan. Line 150: if (child_idx_ < children_.size() || Is this just an optimisation? Might be best to remove it and keep the code simpler unless we have data showing it's a bottleneck. If I understand it correctly it only helps on the last GetNext() call. If we keep it we should initialise tuple_buf to NULL so if we mess up it's more debuggable. http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 99: /// Returns true if the child can be passed through. Nit: "if the child at 'idx'" http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: Line 541: /// Return true if the tuple ids of this descriptor match tuple ids of other desc. This comment needs updating to reflect the new behaviour. http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 228: if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) return false; Also need to compare the NullIndicatorBit(). There are some subtle differences with how we compute the mem layout for Kudu tables can have non-nullable slots, so I think there's an interesting test where we union the output of an aggregate function like count() (where slots are non-nullable and don't get a null bit) and a Kudu table with a non-nullable Kudu column, which gets a null bit regardless. http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1536: if (ctx_.hasSubplan()) unionNode.disablePassthrough(); Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget. http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 68: // If false, no child nodes would be passed through in the backend. nit: "batches from child nodes" to be a bit clearer Line 71: // Indicates which child nodes can be passed through in the backend. nit: "batches from child nodes" to be a bit clearer PS4, Line 176: /* /** Line 177:* Compute the children for which rows can be forwarded by the Union node without being It's a little unclear what the input is. Maybe "Compute whether we can pass through rows from a child whether 'childExprList' is evaluated over a row with 'childTupleIds'" Line 183: // Pass through is only done for the simple case where the row has a single tuple. What's the motivation for this? Is it because the union output is always a row with a single tuple? Line 266: analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i))); I *think* in principle we may need to also check the nullable tuple IDs, since the output tuple should be non-nullable but the input could be nullable in theory. In practice I don't think it's possible for a row with 1 tuple but it would be better to be conservative. Alex probably has more insight. Line 299: if (!passThrough_.isEmpty()) { We probably don't want to print this at explain_level MINIMAL. http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test File testdata/workloads/functional-planner/queries/PlannerTest/union.test: Line 3103: Can you add a couple of tests where there is a table scan on one branch of the union and a hash join on the other? I didn't see much test coverage of joins in unions. The table scan should be passed through and the hash join shouldn't. The idea is
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Joe McDonnell has uploaded a new patch set (#5). Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. IMPALA-4792: Fix number of distinct values for a CASE with constant outputs If all the return values of a Case expression have a known number of distinct values (i.e. they are constant or statistics exist), then the number of distinct values for the Case can be computed using this information. In order for the value from Case to be used at higher levels in the tree, the implementation of computeNumDistinctValues for Expr needed to change. Previously, Expr calculated the number of distinct values by finding any SlotRefs in its tree and taking the maximum of the distinct values from those SlotRefs. This would ignore the value from CaseExpr. To fix this, Expr now takes the maximum number of distinct values across all of its children. -- explaining this statement shows cardinality = 2 explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; -- explaining this statement shows cardinality = 2 explain select distinct char_length(case when id = 1 then 'yes' else 'no' end) from functional.alltypes; -- explaining this statement shows cardinality = 7300 explain select distinct case when id = 1 then 0 else id end from functional.alltypes; -- explaining this statement shows cardinality = 737 (date_string_col has lower -- cardinality than id) explain select distinct case when id = 1 then 'yes' else date_string_col end from functional.alltypes; For cases when the number of distinct values is not known for all the outputs, this will return -1, indicating that the number of distinct values is not known. The inputs (whens) are not used for calculating the number of distinct values. Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 --- M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java A fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java 3 files changed, 183 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/5 -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] Add nested testdata flattener
Michael Brown has posted comments on this change. Change subject: Add nested testdata flattener .. Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/5787/1//COMMIT_MSG Commit Message: Line 22: How was this tested? http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/README File testdata/TableFlattener/README: Line 3: It would be useful to explain also how nested data types (array, struct, map) get flattened. Line 11: Mention that testdata/bin/generate-load-nested.sh uses this. PS1, Line 12: There are various options to specify the type of input file but the output is always : parquet/snappy. You could mention that the tool tries to use file extension to infer the input format. If you don't want to mention the other options here, you could mention how to get help: mvn exec:java -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help" http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/pom.xml File testdata/TableFlattener/pom.xml: Line 1: Would Maven permit this whole tree be moved to testdata/src/main/java/org/apache/impala/TableFlattener ? As it is, there seem to be competing Java source trees about curating test data. http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java: PS1, Line 93: "value" Why not item for array? PS1, Line 96: "idx" Why not pos? PS1, Line 103: "_values"; This will lead to "foo__values" (with 2 underscores). Is that intended? PS1, Line 120: "_" Shouldn't this use getNameSeparator() ? http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java: PS1, Line 51: dataset It would read a bit clearer to call this dstDataSet. Line 60: Preconditions.checkState(srcSchema.getType() == Type.RECORD); It would be nice to have a comment on this method describing what it does and the parameters. PS1, Line 65: field.schema() Why not fieldSchema? Line 85: // Ensure that the parent schema has an id field so the child can reference the Same with this method: add brief description of method and parameter meanings. PS1, Line 93: LinkedList fields = new LinkedList<>(); Is it correct to be instantiating fields this way, or should you be using the method written above: LinkedList fields = Lists.newLinkedList(); -- To view, visit http://gerrit.cloudera.org:8080/5787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 22 files changed, 467 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through .. IMPALA-3586 (Part 1): Implement Union Pass Through The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Testing: Verified that existing tests cover the case where no/some/all union children of the union node can be passed through. Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 --- M be/src/exec/union-node.cc M be/src/exec/union-node.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/order.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/union.test 22 files changed, 467 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner .. IMPALA-3748: Part 1: Clean up resource estimation in planner This is in preparation to use this code to compute the minimum reservation. The cleanup restructures the code slightly so that it's clearer whether resource estimates are valid or invalid. It also removes the unused VCore estimates. Fixes a couple of bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading I left one bug unfixed because it is subtle and will be easier to review in isolation: IMPALA-4862. There is some remaining questionable behaviour I left untouched: * It's unclear why unpartitioned fragments are always excluded from total memory consumption. * Many operators do not have estimates or have questionable heuristics. Testing: Re-enabled the explain_level tests, which appear to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests and a number of planner test cases for memory estimates. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 --- M be/src/service/query-exec-state.cc M common/thrift/Frontend.thrift M docs/shared/impala_common.xml M docs/topics/impala_explain.xml M docs/topics/impala_explain_level.xml M docs/topics/impala_explain_plan.xml M docs/topics/impala_hbase.xml M docs/topics/impala_known_issues.xml M docs/topics/impala_optimize_partition_key_scans.xml M docs/topics/impala_perf_joins.xml M docs/topics/impala_perf_stats.xml M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test A testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test M tests/metadata/test_explain.py 29 files changed, 810 insertions(+), 1,231 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/3 -- To view, visit http://gerrit.cloudera.org:8080/5847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs
Alex Behm has posted comments on this change. Change subject: IMPALA-4822: Dynamic log level changes to Catalog and Frontend JVMs .. Patch Set 1: (2 comments) Responding to comments. Will wait for next PS to continue the review. http://gerrit.cloudera.org:8080/#/c/5792/1/be/src/util/logging-support.cc File be/src/util/logging-support.cc: Line 26: #include "rpc/jni-thrift-util.h" > Moved. For my understanding, whats special about names.h include? names.h basically has a bunch of common "using" directives, so is more appropriate to put into the "using" section and not the "include" section if that makes sense http://gerrit.cloudera.org:8080/#/c/5792/1/common/thrift/Logging.thrift File common/thrift/Logging.thrift: Line 35: struct TGetLogLevelParams { > Per my understanding log4j doesn't provide any easy way to get that list. S Agree that there is additional bookeeping. I'm generally of the opinion that if the user has a way to change a configuration setting, then there should also be a way to inspect the current state of the configuration. This serves two purposes: 1. When changing a setting, the user gets immediate visual feedback that the setting is indeed in effect. 2. The user can easily see if he forgot to undo some changes he made in the past. It can avoid making redundant changes (forgot which classes he already increased the log level for). For now, I'm ok with implementing a button that reset the configuration back to the original state. Happy to discuss further. -- To view, visit http://gerrit.cloudera.org:8080/5792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I588418e9bcb0b66d33138baf96207a5a35bfbd63 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] Release note updates for Impala 2.8
Alex Behm has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/5668/7/docs/topics/impala_new_features.xml File docs/topics/impala_new_features.xml: Line 65: We should definitely also highlight IMPALA-1430 which fixes the slow COMPUTE STATS issue with TIMESTAMP. COMPUTE STATS is now codegen'd even if there are TIMESTAMP columns. The JIRA is not yet marked as fixed because there is some follow-on work, but the COMPUTE STATS issue specifically has been fixed. Line 130: format TIMESTAMP values, such as the result A naked now() function is a particularly bad example, because now() is already constant within a query, so does not benefit from this improvement. A better example is something like: WHERE date_col = to_date(now() - interval 1 day) the right-hand size expression is somewhat complicated and expensive to evaluate for every row. Constant folding makes this significantly faster. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables
Alex Behm has posted comments on this change. Change subject: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/5815/1//COMMIT_MSG Commit Message: Line 7: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of describe the fix not the bug: DROP DATABASE CASCADE removes caching directives http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1279: uncacheTable(removedDb.getTable(tableName)); It's a little tricky to figure out, but I think we may have to lock these tables. For example, there might be a concurrent request for caching a table/partition and there could be a concurrent deepCopy() of the msTbl which may not be safe with removing entries from the msTbl properties. Might be worth looking into at least. http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java File fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java: Line 163: org.apache.hadoop.hive.metastore.api.Partition part) throws ImpalaException { tab http://gerrit.cloudera.org:8080/#/c/5815/1/tests/query_test/test_hdfs_caching.py File tests/query_test/test_hdfs_caching.py: Line 212: """The DROP DATABASE CASCADE should properly drop all impacted cache directives Remove "The" Line 213:IMPALA-2518""" use IMPALA-2518 as a prefix to comment (like we typically do) Line 215: # Creates `cachedb` database with some cached tables and partitions Populates the 'cachedb' database... -- To view, visit http://gerrit.cloudera.org:8080/5815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/226/ -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4738: STDDEV SAMP should return NULL for single record input
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4738: STDDEV_SAMP should return NULL for single record input .. Patch Set 2: Code-Review+2 (2 comments) please let john know we should doc this change http://gerrit.cloudera.org:8080/#/c/5800/2/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: PS2, Line 26: , please add a space PS2, Line 30: 0,0 so all of these match postgres now? -- To view, visit http://gerrit.cloudera.org:8080/5800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ide8af752cd8a2e554a2cd5a1ec948967a80de1fe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4829: Change default Kudu read behavior for "RYW"
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4829: Change default Kudu read behavior for "RYW" .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/225/ -- To view, visit http://gerrit.cloudera.org:8080/5802 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4011f8277083982aee2c6c2bfca2f4ae2f8cb31e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. Patch Set 1: Code-Review+2 whoops, thanks David! -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 4: looks good, but let's add tests -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
Jim Apple has posted comments on this change. Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2. .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Joe McDonnell has uploaded a new patch set (#4). Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. IMPALA-4792: Fix number of distinct values for a CASE with constant outputs If all the return values of a Case expression have a known number of distinct values (i.e. they are constant or statistics exist), then the number of distinct values for the Case can be computed using this information. In order for the value from Case to be used at higher levels in the tree, the implementation of computeNumDistinctValues for Expr needed to change. Previously, Expr calculated the number of distinct values by finding any SlotRefs in its tree and taking the maximum of the distinct values from those SlotRefs. This would ignore the value from CaseExpr. To fix this, Expr now takes the maximum number of distinct values across all of its children. -- explaining this statement shows cardinality = 2 explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; -- explaining this statement shows cardinality = 2 explain select distinct char_length(case when id = 1 then 'yes' else 'no' end) from functional.alltypes; -- explaining this statement shows cardinality = 7300 explain select distinct case when id = 1 then 0 else id end from functional.alltypes; -- explaining this statement shows cardinality = 737 (date_string_col has lower -- cardinality than id) explain select distinct case when id = 1 then 'yes' else date_string_col end from functional.alltypes; For cases when the number of distinct values is not known for all the outputs, this will return -1, indicating that the number of distinct values is not known. The inputs (whens) are not used for calculating the number of distinct values. Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 --- M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java 2 files changed, 73 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/5768/4 -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-4792: Fix number of distinct values for a CASE with constant outputs
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4792: Fix number of distinct values for a CASE with constant outputs .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/5768/3/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java: Line 393: long numOutputConstants = 0; > int is fine here, if you overflow that we have other problems. Done Line 402: if ((i - loopStart) % 2 == 1 || (i == children_.size() - 1 && hasElseExpr_)) { > check the negation and continue (to save one indentation level). Done Line 409: if (constLiteralSet.add(outputLiteral)) ++numOutputConstants; > nice optimization. is there a test for that? I hand tested it. I was looking through our tests, and it would be easy for me to add a SQL to QueryTest/explain-level2.test to test this patch's behavior. -- To view, visit http://gerrit.cloudera.org:8080/5768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21dbdaad8452b7e58c477612b47847dccd9d98d2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
Michael Brown has posted comments on this change. Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2. .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
Michael Brown has posted comments on this change. Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4853: Skip test kudu dml reporting if Kudu is not supported.
David Knupp has uploaded a new change for review. http://gerrit.cloudera.org:8080/5854 Change subject: IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. .. IMPALA-4853: Skip test_kudu_dml_reporting if Kudu is not supported. This test is failing on distros that don't support Kudu, but it shouldn't even be run. Tested by setting KUDU_IS_SUPPORTED to false, and then trying to run the test, confirming that it gets skipped. When the env var KUDU_IS_SUPPORTED is true, the test runs. Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e --- M tests/shell/test_shell_commandline.py 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5854/1 -- To view, visit http://gerrit.cloudera.org:8080/5854 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia36319228d4e9cac9cb675f3207ef2ba39f24e7e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp