[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: > > This can't be tested on hdfs since there are no "remote" blocks > in > > the minicluster. So all the scan ranges of a file are added to > the > > appropriate local disk queue once the header is processed. > > This came up in a conversation between me and Joe today as well. > Replication in HDFS is per file, so we should be able to "hdfs put" > with appropriate options to induce a remote block, even in the > minicluster. Unfortunately, it doesn't seem to work with the > following sequence: > > $ impala-shell.sh -q 'create table t (x string)' > $ yes | head > /tmp/f > $ hadoop fs -D dfs.replication=1 -put /tmp/f /test-warehouse/t > $ impala-shell.sh -i localhost:21002 -q 'set num_nodes=1; > invalidate metadata t; select * from t limit 2; profile' | grep -i > BytesReadShortCircuit > > Impala seems to be doing short-circuit-read on all the impalad's > (presumably because the datanode somewhat reasonably decides things > are indeed local). > > Anyway--this surprised me so I figured I'd mention it. The scheduler treats all backend *hosts* the same. In particular all reads on the minicluster will be assigned to "localhost". Then we have some special handling for backend hosts that have multiple impalads running: we assign scan ranges round-robin without considering the actual size of each scan range (scheduler.cc:890). This is only used during testing and not supported on production deployments so we don't try to be very sophisticated. On second thought I hoped we might be able to provoke a remote read if we add multiple files that only reside on the first data node and then scan all of them. I tried this and it didn't work. The HDFS file browser shows "localhost" as the location of each file, making me think that it does not make a distinction between each datanode and instead figures out how to perform a short circuit read directly. The scheduler itself makes the right assignments: I1029 21:19:09.685812 24983 scheduler.cc:995] ScanRangeAssignment: server=TNetworkAddress { 01: hostname (string) = "lv-desktop", 02: port (i32) = 22000, } I1029 21:19:09.685825 24983 scheduler.cc:1001] node_id=0 ranges=TScanRangeParams { 01: scan_range (struct) = TScanRange { 01: hdfs_file_split (struct) = THdfsFileSplit { 01: file_name (string) = "f3", 02: offset (i64) = 0, 03: length (i64) = 20, 04: partition_id (i64) = 0, 05: file_length (i64) = 20, 06: file_compression (i32) = 0, 07: mtime (i64) = 1540872321692, }, }, 02: volume_id (i32) = 2, 03: is_cached (bool) = false, 04: is_remote (bool) = false, } I1029 21:19:09.685854 24983 scheduler.cc:995] ScanRangeAssignment: server=TNetworkAddress { 01: hostname (string) = "lv-desktop", 02: port (i32) = 22002, } I1029 21:19:09.685863 24983 scheduler.cc:1001] node_id=0 ranges=TScanRangeParams { 01: scan_range (struct) = TScanRange { 01: hdfs_file_split (struct) = THdfsFileSplit { 01: file_name (string) = "f2", 02: offset (i64) = 0, 03: length (i64) = 20, 04: partition_id (i64) = 0, 05: file_length (i64) = 20, 06: file_compression (i32) = 0, 07: mtime (i64) = 1540872318716, }, }, 02: volume_id (i32) = 1, 03: is_cached (bool) = false, 04: is_remote (bool) = false, } I1029 21:19:09.685868 24983 scheduler.cc:995] ScanRangeAssignment: server=TNetworkAddress { 01: hostname (string) = "lv-desktop", 02: port (i32) = 22001, } I1029 21:19:09.685875 24983 scheduler.cc:1001] node_id=0 ranges=TScanRangeParams { 01: scan_range (struct) = TScanRange { 01: hdfs_file_split (struct) = THdfsFileSplit { 01: file_name (string) = "f", 02: offset (i64) = 0, 03: length (i64) = 20, 04: partition_id (i64) = 0, 05: file_length (i64) = 20, 06: file_compression (i32) = 0, 07: mtime (i64) = 1540872285223, }, }, 02: volume_id (i32) = 0, 03: is_cached (bool) = false, 04: is_remote (bool) = false, } -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 04:22:08 + Gerrit-HasComments: No
[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: > This can't be tested on hdfs since there are no "remote" blocks in > the minicluster. So all the scan ranges of a file are added to the > appropriate local disk queue once the header is processed. This came up in a conversation between me and Joe today as well. Replication in HDFS is per file, so we should be able to "hdfs put" with appropriate options to induce a remote block, even in the minicluster. Unfortunately, it doesn't seem to work with the following sequence: $ impala-shell.sh -q 'create table t (x string)' $ yes | head > /tmp/f $ hadoop fs -D dfs.replication=1 -put /tmp/f /test-warehouse/t $ impala-shell.sh -i localhost:21002 -q 'set num_nodes=1; invalidate metadata t; select * from t limit 2; profile' | grep -i BytesReadShortCircuit Impala seems to be doing short-circuit-read on all the impalad's (presumably because the datanode somewhat reasonably decides things are indeed local). Anyway--this surprised me so I figured I'd mention it. -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 03:36:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. IMPALA-7735: Expose queuing status in ExecSummary and impala-shell This patch adds the queuing status, that is, whether the query was queued and what was the latest queuing reason, to the ExecSummary. Also added changes to allow impala-shell to expose this status by pulling it out from the ExecSummary when either live_summary or live_progress is set to true. Testing: Added custom cluster tests. Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Reviewed-on: http://gerrit.cloudera.org:8080/11816 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/impala-server.cc M common/thrift/ExecStats.thrift M shell/impala_shell.py M tests/custom_cluster/test_admission_controller.py A tests/custom_cluster/test_shell_interactive.py 5 files changed, 102 insertions(+), 24 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 02:02:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7166: ExecSummary should be a first class object.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11555 ) Change subject: IMPALA-7166: ExecSummary should be a first class object. .. IMPALA-7166: ExecSummary should be a first class object. Impala RuntimeProfile currently contains "ExecSummary" as a string. We should make it a first class thrift object, so that tools can extract these fields (Est rows etc..). Testing: Modified unit test. Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Reviewed-on: http://gerrit.cloudera.org:8080/11555 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/service/impala-server.cc M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift 5 files changed, 48 insertions(+), 3 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Gerrit-Change-Number: 11555 Gerrit-PatchSet: 5 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Yongjun Zhang
[Impala-ASF-CR] IMPALA-7166: ExecSummary should be a first class object.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11555 ) Change subject: IMPALA-7166: ExecSummary should be a first class object. .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Gerrit-Change-Number: 11555 Gerrit-PatchSet: 4 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Yongjun Zhang Gerrit-Comment-Date: Tue, 30 Oct 2018 01:11:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1209/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 8 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 30 Oct 2018 00:45:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7687: [DOCS] Support for multiple DISTINCT in a query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11823 ) Change subject: IMPALA-7687: [DOCS] Support for multiple DISTINCT in a query .. Patch Set 1: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/125/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/11823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a6e664b016e9408a3ff809f1811253a91764481 Gerrit-Change-Number: 11823 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 00:38:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7687: [DOCS] Support for multiple DISTINCT in a query
Alex Rodoni has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11823 Change subject: IMPALA-7687: [DOCS] Support for multiple DISTINCT in a query .. IMPALA-7687: [DOCS] Support for multiple DISTINCT in a query - Removed notes about the single DISTINCT restriction. - Rewrote the description for the APPX_COUNT_DISTINCT query option. Change-Id: I3a6e664b016e9408a3ff809f1811253a91764481 --- M docs/shared/impala_common.xml M docs/topics/impala_appx_count_distinct.xml M docs/topics/impala_count.xml M docs/topics/impala_distinct.xml M docs/topics/impala_langref_unsupported.xml M docs/topics/impala_select.xml 6 files changed, 33 insertions(+), 111 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/11823/1 -- To view, visit http://gerrit.cloudera.org:8080/11823 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3a6e664b016e9408a3ff809f1811253a91764481 Gerrit-Change-Number: 11823 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1208/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 00:24:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 4: OK. Will do a pass tonight. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 00:23:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11760 to look at the new patch set (#8). Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-level conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 --- M be/src/exprs/conditional-functions.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 17 files changed, 870 insertions(+), 313 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/8 -- To view, visit http://gerrit.cloudera.org:8080/11760
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11822 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1207/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idfe275032dc13aee64bfd60d448138b5b77ab96a Gerrit-Change-Number: 11822 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 30 Oct 2018 00:19:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11822 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Abandoned Duplicate -- To view, visit http://gerrit.cloudera.org:8080/11822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Idfe275032dc13aee64bfd60d448138b5b77ab96a Gerrit-Change-Number: 11822 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: This can't be tested on hdfs since there are no "remote" blocks in the minicluster. So all the scan ranges of a file are added to the appropriate local disk queue once the header is processed. > Patch Set 3: -Code-Review > > Adding a targeted test that uses some profile counters seems like a great > idea. Maybe it's possible to do this already on HDFS with the right query and > options. E.g. run a query with limit=1 with num_nodes=1 and either > num_scanner_threads=1 or mt_dop=1 and confirm that only one file is opened > from the profile. -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 00:09:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11800 ) Change subject: IMPALA-6374: fix handling of commas in .test files .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1206/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Oct 2018 00:02:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 4: Code-Review+1 (2 comments) Thanks, Tim for the quick reviews. Carrying +1. http://gerrit.cloudera.org:8080/#/c/11778/3/be/src/util/error-util.cc File be/src/util/error-util.cc: http://gerrit.cloudera.org:8080/#/c/11778/3/be/src/util/error-util.cc@231 PS3, Line 231: DCHECK(false) << "Unexpected hs2Code: " << hs2Code; > clang-tidy failed because this didn't return a value. Fixed it. Missed in the previous rev. http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test: http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test@2203 PS2, Line 2203: set mem_limit=1m; > Ah interesting. Maybe consider setting a lower mem_limit just to be sure th Done -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 4 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 23:48:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 7: (14 comments) Thanks, Phil, for the code review. Addressed all but one comment. http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@29 PS7, Line 29: As a result, the BE retains the original interpreted forms that > Let's mention this in the backend code in comments, so that folks know it's Done http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@30 PS7, Line 30: are still used in two cases: 1) top-leel conditions in the > nit: level Done http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@45 PS7, Line 45: Still, the fix provides most of what the JIRA ticket requested > Does the specific "compute stats" query that's underlying the JIRA get fast Haven't gotten that far yet; still trying to make sure that things actually work. Will provide an update once I can do the required performance testing. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java File fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@54 PS7, Line 54: > nit: it's weird that there are two spaces there. Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@66 PS7, Line 66: // then want to allow CASE to do any simplification (reverting to > nit: The parenthetical here isn't closed. Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@70 PS7, Line 70: Expr revised = rewriteConditionalFn((FunctionCallExpr) expr); > nit: should only have one space after = Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@72 PS7, Line 72: // Workaround for IMALA-TBD > Is TBD resolved at this point? Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@132 PS7, Line 132:* IFNULL(a, x) --> > If we're not using HTML, is the here noise? Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@179 PS7, Line 179: if (childExpr.isAggregate()) { sawAgg = true; } > The javadoc for "expr.contains()" suggests that hasAgg has already checked True. This check adjusts the non-null literal case. Ignoring aggregates, we can stop adding arguments (ignoring all remaining arguments) if we hit a non-null literal. (This is the ! hasAgg case.) Or, if we have aggregates, we can stop if we've already added at least one aggregate. Otherwise, we have to keep going to add at least one aggregate -- even though the aggregate will never actually be evaluated. (This is the sawAgg case.) But, that I had to explain this means that using dual flags is the wrong approach. Rewrote to use a state machine. Please let me know if that is clearer. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@50 PS7, Line 50: * We can't eliminate aggregates as this may change the meaning of the > This is the same as line 70-72. Is that intentional? Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java File fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@42 PS7, Line 42: System.out.println(stmt); > remove Done http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@97 PS7, Line 97: @Test : public void testSqlRecovery() { : ParseNode node = analyzeExpr("true and false"); : System.out.println(node.toSql()); : System.out.println(((SelectStmt) node).toSql(true)); : // SELECT TRUE AND FALSE FROM functional.alltypessmall : } > There's no assertion here... Ad-hoc test. Removed. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@117 PS7, Line 117: // TODO: IMPALA-7766 > nit: including the subject of the JIRA here may make it obvious why... Done
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11822 Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-level conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Fixes Change-Id: Idfe275032dc13aee64bfd60d448138b5b77ab96a --- M be/src/exprs/conditional-functions.h M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 17 files changed, 870 insertions(+), 313 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/11822/1 -- To view, visit http://gerrit.cloudera.org:8080/11822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11800 ) Change subject: IMPALA-6374: fix handling of commas in .test files .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11800/3/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/11800/3/tests/common/test_result_verifier.py@117 PS3, Line 117: i > You must mean "i + 1" here, yes? Oops. Fixed but I'll rerun tests in case more test files need to be updated as a result of this. -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 23:27:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Hello Michael Brown, David Knupp, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11800 to look at the new patch set (#4). Change subject: IMPALA-6374: fix handling of commas in .test files .. IMPALA-6374: fix handling of commas in .test files The .test file parser implemented an unconventional method for parsing single-quoted strings in comma-separated value format. This didn't handle trailing commas in the string correctly. This commit switches to using a conventional method for parsing comma-separated value format: * Commas enclosed by single quotes are not treated as field separators * Single quotes can be escaped within a string by doubling them. I looked into using Python's .csv module for this, but it wouldn't work without modifying the test file format more because it automatically discards the quotes during parsing, which are actually semantically important in .test files. E.g. without the quotes we can't distinguish between the literal string 'regex:...' and the regex regex: Testing: Ran exhaustive tests and fixed .test files that required modifications. Will rerun before merging. Added a couple of tests to exercise edge cases in the test file parser. Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 --- M testdata/workloads/functional-query/queries/QueryTest/functions-ddl.test M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test M testdata/workloads/functional-query/queries/QueryTest/hbase-inserts.test M testdata/workloads/functional-query/queries/QueryTest/misc.test A testdata/workloads/functional-query/queries/QueryTest/special-strings.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q98.test M testdata/workloads/tpcds/queries/tpcds-q98.test M tests/common/test_result_verifier.py M tests/query_test/test_exprs.py 10 files changed, 157 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/11800/4 -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11800 ) Change subject: IMPALA-6374: fix handling of commas in .test files .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11800/3/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/11800/3/tests/common/test_result_verifier.py@117 PS3, Line 117: i You must mean "i + 1" here, yes? -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Mon, 29 Oct 2018 22:54:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] CDH-73537: Skip test default timezone when testing a real cluster.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11820 ) Change subject: CDH-73537: Skip test_default_timezone when testing a real cluster. .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1205/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f Gerrit-Change-Number: 11820 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Mon, 29 Oct 2018 22:53:21 + Gerrit-HasComments: No
[Impala-ASF-CR] CDH-73537: Skip test default timezone when testing a real cluster.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11820 ) Change subject: CDH-73537: Skip test_default_timezone when testing a real cluster. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1204/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f Gerrit-Change-Number: 11820 Gerrit-PatchSet: 1 Gerrit-Owner: David Knupp Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Mon, 29 Oct 2018 22:50:05 + Gerrit-HasComments: No
[Impala-ASF-CR] CDH-73537: Skip test default timezone when testing a real cluster.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11820 ) Change subject: CDH-73537: Skip test_default_timezone when testing a real cluster. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11820/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11820/2//COMMIT_MSG@7 PS2, Line 7: CDH-73537 We need an upstream Jira here, not one for Cloudera. -- To view, visit http://gerrit.cloudera.org:8080/11820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f Gerrit-Change-Number: 11820 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Mon, 29 Oct 2018 22:30:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 ) Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11805/1//COMMIT_MSG@16 PS1, Line 16: Testing: : - Ran the test file locally. The queries are fine, but what does it mean to run these locally? I'm not too familiar with these workloads (targeted-perf, targeted-stress). Do the queries get run in GVO? Is it expected downstream larger-scale testing will run these? The stress test (concurrent_select.py) won't find these queries, for example. -- To view, visit http://gerrit.cloudera.org:8080/11805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9 Gerrit-Change-Number: 11805 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 22:29:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1203/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 22:29:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7614: [DOCS] Document the New Invalidate Options
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/11809 ) Change subject: IMPALA-7614: [DOCS] Document the New Invalidate Options .. Patch Set 1: Tiany and Adrian, Could you please review for the coming release? Let me know if I should add any other reviewers. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/11809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40c552eeaee81ee6528d9f725bd416b51d8ab837 Gerrit-Change-Number: 11809 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Mon, 29 Oct 2018 22:12:09 + Gerrit-HasComments: No
[Impala-ASF-CR] CDH-73537: Skip test default timezone when testing a real cluster.
Hello Michael Brown, Attila Jeges, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11820 to look at the new patch set (#2). Change subject: CDH-73537: Skip test_default_timezone when testing a real cluster. .. CDH-73537: Skip test_default_timezone when testing a real cluster. test_shell_commandline.py::test_default_timezone assumes that the cluster is running on the same platform as the test process, but that's only guaranteed when the testing a local minicluster. When run against a real cluster, the test executor can be a completely different OS. Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f --- M tests/shell/test_shell_commandline.py 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11820/2 -- To view, visit http://gerrit.cloudera.org:8080/11820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f Gerrit-Change-Number: 11820 Gerrit-PatchSet: 2 Gerrit-Owner: David Knupp Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: -Code-Review Adding a targeted test that uses some profile counters seems like a great idea. Maybe it's possible to do this already on HDFS with the right query and options. E.g. run a query with limit=1 with num_nodes=1 and either num_scanner_threads=1 or mt_dop=1 and confirm that only one file is opened from the profile. -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 22:08:10 + Gerrit-HasComments: No
[Impala-ASF-CR] CDH-73537: Skip test default timezone when testing a real cluster.
David Knupp has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11820 Change subject: CDH-73537: Skip test_default_timezone when testing a real cluster. .. CDH-73537: Skip test_default_timezone when testing a real cluster. test_shell_commandline.py::test_default_timezone assumes that the cluster is running on the same platform as the test process, but that's only guaranteed when the testing a local minicluster. When run against a real cluster, the test executor can be a completely different OS. Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f --- M tests/shell/test_shell_commandline.py 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11820/1 -- To view, visit http://gerrit.cloudera.org:8080/11820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia4d4c503d2c77136cedd8f3fd830b6ce70d4457f Gerrit-Change-Number: 11820 Gerrit-PatchSet: 1 Gerrit-Owner: David Knupp
[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: I haven't yet tested this on a remote cluster. The plan is to test this by adding an s3 test which queries a table with multiple small files and to ensure that the number of files opened should not exceed 2. Do you think that may not be necessary? -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 22:00:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 21:59:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3376/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 21:59:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 2: Code-Review+2 carrying over +2 -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 21:58:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Hello Pooja Nilangekar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11816 to look at the new patch set (#2). Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. IMPALA-7735: Expose queuing status in ExecSummary and impala-shell This patch adds the queuing status, that is, whether the query was queued and what was the latest queuing reason, to the ExecSummary. Also added changes to allow impala-shell to expose this status by pulling it out from the ExecSummary when either live_summary or live_progress is set to true. Testing: Added custom cluster tests. Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 --- M be/src/service/impala-server.cc M common/thrift/ExecStats.thrift M shell/impala_shell.py M tests/custom_cluster/test_admission_controller.py A tests/custom_cluster/test_shell_interactive.py 5 files changed, 102 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/11816/2 -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 1: Code-Review+2 (6 comments) http://gerrit.cloudera.org:8080/#/c/11816/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11816/1//COMMIT_MSG@9 PS1, Line 9: This patch adds the queuing status, that is, whether the query was > Can you mention that this is showed when either live_summary or live_progre yup, thats already mentioned at the end of this paragraph http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift File common/thrift/ExecStats.thrift: http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift@104 PS1, Line 104: queued > "is currently queued by admission control" to be totally clear Done http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift@107 PS1, Line 107: // Contains the latest queuing reason if the query is queued. > Same as above. Done http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py File tests/custom_cluster/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@26 PS1, Line 26: class TestShellInteractive(CustomClusterTestSuite): > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@29 PS1, Line 29: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@38 PS1, Line 38: proc = pexpect.spawn(' '.join([SHELL_CMD, "-i localhost:21000"])) > I think it would help connecting to another port (i.e. another impalad) to Good point! but we already have tests that rely on pool stats being propagated across the cluster -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 21:58:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 7 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 21:48:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6658: improve Parquet RLE for low bit widths
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11582 ) Change subject: IMPALA-6658: improve Parquet RLE for low bit widths .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 Gerrit-Change-Number: 11582 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 29 Oct 2018 21:48:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 7: (14 comments) Thanks for the updates! http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@29 PS7, Line 29: As a result, the BE retains the original interpreted forms that Let's mention this in the backend code in comments, so that folks know it's not far away from being dead code. http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@30 PS7, Line 30: are still used in two cases: 1) top-leel conditions in the nit: level http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@45 PS7, Line 45: Still, the fix provides most of what the JIRA ticket requested Does the specific "compute stats" query that's underlying the JIRA get faster in your testing? http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java File fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@54 PS7, Line 54: nit: it's weird that there are two spaces there. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@66 PS7, Line 66: // then want to allow CASE to do any simplification (reverting to nit: The parenthetical here isn't closed. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@70 PS7, Line 70: Expr revised = rewriteConditionalFn((FunctionCallExpr) expr); nit: should only have one space after = http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@72 PS7, Line 72: // Workaround for IMALA-TBD Is TBD resolved at this point? http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@132 PS7, Line 132:* IFNULL(a, x) --> If we're not using HTML, is the here noise? http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@179 PS7, Line 179: if (childExpr.isAggregate()) { sawAgg = true; } The javadoc for "expr.contains()" suggests that hasAgg has already checked all the children, so it shouldn't be possible that hasAgg = false but sawAgg = true. http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@50 PS7, Line 50: * We can't eliminate aggregates as this may change the meaning of the This is the same as line 70-72. Is that intentional? http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java File fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java: http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@42 PS7, Line 42: System.out.println(stmt); remove http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@97 PS7, Line 97: @Test : public void testSqlRecovery() { : ParseNode node = analyzeExpr("true and false"); : System.out.println(node.toSql()); : System.out.println(((SelectStmt) node).toSql(true)); : // SELECT TRUE AND FALSE FROM functional.alltypessmall : } There's no assertion here... http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@117 PS7, Line 117: // TODO: IMPALA-7766 nit: including the subject of the JIRA here may make it obvious why... http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@128 PS7, Line 128: public void testIf() throws ImpalaException { Should we be testing the boring case: if(a, b, c) => case when a then b else c for columns a, b, c. (As opposed to aggregations or constants.) Update: I think this is covered in the next test file. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 7 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 6 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 21:36:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7765: [DOCS] Document IMPALA MAX MEM ESTIMATE FOR ADMISSION option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11804 ) Change subject: IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option .. Patch Set 3: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/124/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/11804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 Gerrit-Change-Number: 11804 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 29 Oct 2018 21:31:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7765: [DOCS] Document IMPALA MAX MEM ESTIMATE FOR ADMISSION option
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/11804 ) Change subject: IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option .. Patch Set 3: One clarification per Tim's comment on the MEM_LIMIT option requirement. -- To view, visit http://gerrit.cloudera.org:8080/11804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 Gerrit-Change-Number: 11804 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 29 Oct 2018 21:30:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7765: [DOCS] Document IMPALA MAX MEM ESTIMATE FOR ADMISSION option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11804 ) Change subject: IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option .. Patch Set 3: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/124/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/11804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 Gerrit-Change-Number: 11804 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 29 Oct 2018 21:29:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7765: [DOCS] Document IMPALA MAX MEM ESTIMATE FOR ADMISSION option
Hello Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11804 to look at the new patch set (#3). Change subject: IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option .. IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 --- M docs/impala.ditamap A docs/topics/impala_max_mem_estimate_for_admission.xml 2 files changed, 90 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/11804/3 -- To view, visit http://gerrit.cloudera.org:8080/11804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 Gerrit-Change-Number: 11804 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11790 ) Change subject: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog .. Patch Set 3: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/123/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/11790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fd9b88138350406065df2f39a48043178759949 Gerrit-Change-Number: 11790 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 21:27:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11800 ) Change subject: IMPALA-6374: fix handling of commas in .test files .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1199/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Mon, 29 Oct 2018 21:27:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1198/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 21:19:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6658: improve Parquet RLE for low bit widths
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/11582 ) Change subject: IMPALA-6658: improve Parquet RLE for low bit widths .. Patch Set 2: (5 comments) Thanks Thomas and Csaba for reviews so far http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11582/2//COMMIT_MSG@20 PS2, Line 20: By default RleEncoder will now use run length encoding for runs of : length 24 for single bit values, and of length 16 for 2 bit wide values. : All other bit widths will use the existing length 8 runs. > >This could be avoided by using smaller I agree that this might be a better way, but I am happy with the simple improvement we get from this change. http://gerrit.cloudera.org:8080/#/c/11582/3/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/11582/3/be/src/util/rle-encoding.h@266 PS3, Line 266: 1 byte for the repeated value > I prefer to phrase it as "never worse than 8". For alternating long repeate Done http://gerrit.cloudera.org:8080/#/c/11582/3/be/src/util/rle-encoding.h@267 PS3, Line 267: > Maybe "better than 16" would be better. Done http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/11582/2/be/src/util/rle-encoding.h@203 PS2, Line 203: the previous run is flushed out > Done Done for real this time http://gerrit.cloudera.org:8080/#/c/11582/3/be/src/util/rle-test.cc File be/src/util/rle-test.cc: http://gerrit.cloudera.org:8080/#/c/11582/3/be/src/util/rle-test.cc@216 PS3, Line 216: c > Oops, I missed this one: Done -- To view, visit http://gerrit.cloudera.org:8080/11582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 Gerrit-Change-Number: 11582 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 29 Oct 2018 21:14:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7761: Add multiple DISTINCT to targeted stress and perf
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11805 ) Change subject: IMPALA-7761: Add multiple DISTINCT to targeted stress and perf .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/11805 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400aaf6b6620b4001895eafff785956bffb312c9 Gerrit-Change-Number: 11805 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 21:12:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6658: improve Parquet RLE for low bit widths
Andrew Sherman has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/11582 ) Change subject: IMPALA-6658: improve Parquet RLE for low bit widths .. IMPALA-6658: improve Parquet RLE for low bit widths RleEncoder buffers values in its own cache to detect run lengths that can be efficiently encoded. When a run is detected it is written with an indicator byte which encodes the length of the run. So an encoded run always has an overhead of at least one byte. This means that for single bit values, encoding 8 values as a run is inefficient. Change RleEncoder to have the ability to use run lengths other than 8. A new parameter to the constructor (min_run_length) allows test callers (only) to set the minimum run length. By default RleEncoder will now use run length encoding for runs of length 16 for single bit values. All other bit widths will use the existing length 8 runs. Internally RleEncoder must buffer more values so that the longer runs can be detected. The internal buffer “buffered_values_” is larger and is now a circular buffer so that the first 8 bytes of the buffer can be separately flushed to BitWriter. Testing: All end-to-end and unit tests pass The unit test rle-test is enhanced to run all tests against RleEncoders using all possible values of min_run_length. In Addition, rle-test is refactored so that the Rle tests are in a class that inherits from ::testing::Test so that a SetUp() method can be used. The Overflow test is enhanced to be more exhaustive (while still completing in a second or two). Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 --- M be/src/util/rle-encoding.h M be/src/util/rle-test.cc 2 files changed, 500 insertions(+), 255 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/11582/4 -- To view, visit http://gerrit.cloudera.org:8080/11582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I191a581d3f699b6669e48ac9dc39c76ed77c4a76 Gerrit-Change-Number: 11582 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7166: ExecSummary should be a first class object.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11555 ) Change subject: IMPALA-7166: ExecSummary should be a first class object. .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3375/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Gerrit-Change-Number: 11555 Gerrit-PatchSet: 4 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Yongjun Zhang Gerrit-Comment-Date: Mon, 29 Oct 2018 21:06:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7166: ExecSummary should be a first class object.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11555 ) Change subject: IMPALA-7166: ExecSummary should be a first class object. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Gerrit-Change-Number: 11555 Gerrit-PatchSet: 3 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Yongjun Zhang Gerrit-Comment-Date: Mon, 29 Oct 2018 21:06:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7166: ExecSummary should be a first class object.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11555 ) Change subject: IMPALA-7166: ExecSummary should be a first class object. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4791237a5579f16c9efda8e57876d48980739e13 Gerrit-Change-Number: 11555 Gerrit-PatchSet: 4 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Reviewer: Yongjun Zhang Gerrit-Comment-Date: Mon, 29 Oct 2018 21:06:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5050: Add support to read TIMESTAMP MILLIS and TIMESTAMP MICROS from Parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11057 ) Change subject: IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet .. Patch Set 18: Hi Csaba, should I look at this now or is Zoltan still completing his review? -- To view, visit http://gerrit.cloudera.org:8080/11057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3 Gerrit-Change-Number: 11057 Gerrit-PatchSet: 18 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 29 Oct 2018 21:05:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
Greg Rahn has posted comments on this change. ( http://gerrit.cloudera.org:8080/11790 ) Change subject: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml@1425 PS2, Line 1425: 3.0 and lower, approximately : 400 bytes of metadata per column per partition are needed for caching. : Tables with a big number of partitions and many columns can add up to a : significant memory overhead as the metadata must be c > Reworded. LGTM -- To view, visit http://gerrit.cloudera.org:8080/11790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fd9b88138350406065df2f39a48043178759949 Gerrit-Change-Number: 11790 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 21:02:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11760 to look at the new patch set (#7). Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-leel conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 16 files changed, 837 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/7 -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/11760/6/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/11760/6/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@93 PS6, Line 93: return RewritesOkWhereExpr("functional.alltypessmall", exprStr, rule, expectedExprStr); line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 6 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 20:56:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11760 to look at the new patch set (#6). Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-leel conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 16 files changed, 836 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/6 -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/11790 ) Change subject: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml@1425 PS2, Line 1425: 3.0 and lower, for a table : with a big number of partitions and many columns, approximately 400 : bytes of metadata per column per partition, can add up to a significant : memory overhead as the metadata must be cached on the > Could we make this a bit more clear and flow better? Something like (some Reworded. -- To view, visit http://gerrit.cloudera.org:8080/11790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fd9b88138350406065df2f39a48043178759949 Gerrit-Change-Number: 11790 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 20:54:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
Hello Greg Rahn, Adrian Ng, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11790 to look at the new patch set (#3). Change subject: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog .. IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog --pull_incremental_statistics described in the Incremental Stats section. Change-Id: I8fd9b88138350406065df2f39a48043178759949 --- M docs/shared/impala_common.xml M docs/topics/impala_perf_stats.xml 2 files changed, 74 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/11790/3 -- To view, visit http://gerrit.cloudera.org:8080/11790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8fd9b88138350406065df2f39a48043178759949 Gerrit-Change-Number: 11790 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6374: fix handling of commas in .test files
Hello Michael Brown, David Knupp, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11800 to look at the new patch set (#3). Change subject: IMPALA-6374: fix handling of commas in .test files .. IMPALA-6374: fix handling of commas in .test files The .test file parser implemented an unconventional method for parsing single-quoted strings in comma-separated value format. This didn't handle trailing commas in the string correctly. This commit switches to using a conventional method for parsing comma-separated value format: * Commas enclosed by single quotes are not treated as field separators * Single quotes can be escaped within a string by doubling them. I looked into using Python's .csv module for this, but it wouldn't work without modifying the test file format more because it automatically discards the quotes during parsing, which are actually semantically important in .test files. E.g. without the quotes we can't distinguish between the literal string 'regex:...' and the regex regex: Testing: Ran exhaustive tests and fixed .test files that required modifications. Will rerun before merging. Added a couple of tests to exercise edge cases in the test file parser. Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 --- M testdata/workloads/functional-query/queries/QueryTest/functions-ddl.test M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test M testdata/workloads/functional-query/queries/QueryTest/hbase-inserts.test M testdata/workloads/functional-query/queries/QueryTest/misc.test A testdata/workloads/functional-query/queries/QueryTest/special-strings.test M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test M testdata/workloads/tpcds/queries/tpcds-decimal_v2-q98.test M testdata/workloads/tpcds/queries/tpcds-q98.test M tests/common/test_result_verifier.py M tests/query_test/test_exprs.py 10 files changed, 157 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/11800/3 -- To view, visit http://gerrit.cloudera.org:8080/11800 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I18ddcb0440490ddf8184be66d3681038a1615dd9 Gerrit-Change-Number: 11800 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 5: Thanks everyone for your patient reviews of this change. Per PhilZ, removed the per-function optimizations, relying on existing rewrite rules (bugs and all). Added tests to verify that the "downstream" rules work (when they do.) Per Vuk, reduced the scope to only rewrite the requested functions, and to retain the BE implementation; living with the bugs that prevent the rewrites from being done some of the time. Added more tests, including tests of the full rewrite process to ensure that the conditional functions are, in fact, first rewritten to use CASE, then CASE is further simplified. Tests include commented-out cases to record where we decided to live with existing bugs. The result is the minimal change that accomplishes the request in the JIRA without breaking anything. -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 20:51:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5031: Make UBSAN-friendly arithmetic generic
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11810 ) Change subject: IMPALA-5031: Make UBSAN-friendly arithmetic generic .. Patch Set 1: Code-Review+2 (1 comment) Just had a comment about comments - it feels a little cryptic to me (although may be obvious for some people). http://gerrit.cloudera.org:8080/#/c/11810/1/be/src/util/arithmetic-util.h File be/src/util/arithmetic-util.h: http://gerrit.cloudera.org:8080/#/c/11810/1/be/src/util/arithmetic-util.h@116 PS1, Line 116: enum class Ring { INTEGER, FLOAT, NEITHER }; I feel like there's some context missing here. I guess it's to enable generic logic over integral and floating point types, which are both support multiplication and addition but have some subtle differences. -- To view, visit http://gerrit.cloudera.org:8080/11810 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I73bec71e59c5a921003d0ebca52a1d4e49bbef66 Gerrit-Change-Number: 11810 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 20:49:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11760 ) Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@66 PS5, Line 66: public Expr RewritesOk(String tableName, String exprStr, ExprRewriteRule rule, String expectedExprStr) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@71 PS5, Line 71: public Expr RewritesOk(String exprStr, List rules, String expectedExprStr) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@88 PS5, Line 88: public Expr RewritesOkWhereExpr(String exprStr, ExprRewriteRule rule, String expectedExprStr) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@90 PS5, Line 90: return RewritesOkWhereExpr("functional.alltypessmall", exprStr, rule, expectedExprStr); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@93 PS5, Line 93: public Expr RewritesOkWhereExpr(String tableName, String exprStr, ExprRewriteRule rule, String expectedExprStr) line too long (113 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@95 PS5, Line 95: return RewritesOkWhereExpr(tableName, exprStr, Lists.newArrayList(rule), expectedExprStr); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/11760/5/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@98 PS5, Line 98: public Expr RewritesOkWhereExpr(String tableName, String exprStr, List rules, line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 Gerrit-Change-Number: 11760 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 20:46:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11760 to look at the new patch set (#5). Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE .. IMPALA-7655: Rewrite if, isnull, coalesce to use CASE See IMPALA-7655 for backgound. Tim found that the current interpreted forms of if, isnull and coalesce are slow compared to the code-generated CASE statement. This patch rewrites the above functions into the equivalent CASE structure. The rewrite engine has many bugs that are beyond the scope of this change to fix. This change codes around those bugs. The result is that the conditional rewrite happens some of the time, and sometimes produces less-than perfect optimizations. Conditionals in the top-level ORDER BY clause are not rewritten (IMPALA-7753), but those one or more levels down are. Some expressions involving NULL are not simplified (IMPALA-7769). Several hacks were used to work around the fact that the rewrite engine ignores unanalyzed expressions, yet the rewrite engine does not, in general, re-analyze the expressions it produces, causing simplifications to be skipped (IMPALA-7754). And so on. As a result, the BE retains the original interpreted forms that are still used in two cases: 1) top-leel conditions in the ORDER BY clause, and 2) if the user disables rewrites. Further, code generation does not occur for CASE statements in the SELECT clause when it is in the root fragment (the most common case in simple tests.) This is another known bug (IMPALA-4356). One possible performance regression is that the new form of the code evaluates some expressions twice, where the original interpreted code evaluated the argument once. E.g. coalesce(id, 10) is rewritten to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated twice. If the "id" were replaced by a complex sub-expression, the gain from compilation could be offset by doing work twice. (IMPALA-7737) Still, the fix provides most of what the JIRA ticket requested within the limitations of the existing code. Conditional function rewrites are moved into a new class, RewriteConditionalsRule in order to keep things simple. Most functions use the simplest possible rewrite, relying on the existing rewrite rules for further simplification. The one exception is coalesce(): the existing code relies on the semantics of the function and so was retained and slightly improved. The code was extended to produce a CASE statement directly, retaining existing simplifications. Tests for conditional functions were in one large function along with other rewrite tests. Moved them into a new file, then broke up the tests by function to allow much easier debugging of each function one-by-one. This required moving the common test mechanims into a new common base class. Existing tests focus on one or two rules at a time. The conditional function rewrite, however, relies on the entire set of rules being applied repeatedly. So, added a new FullRewriteTest case to verify this behavior. This class contains several commented-out tests that cannot pass due to existing rewrite bugs noted above. Changing the rewrite cause the PlannerTest to produce different plans than previously. Changed the expected results file to match the new rewrite rules. Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test 16 files changed, 831 insertions(+), 309 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/5 -- To view, visit http://gerrit.cloudera.org:8080/11760 To unsubscribe, visit
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality Prior to this change, the AggregationNode's perInstanceCardinality was influenced by the node's selectivity and limit. This was incorrect because the hash table is constructed over the entire input stream before any row batches are produced. This change ensures that the input cardinality is used to determine the perInstanceCardinality. Testing: Added a planner test which ensures that an AggregationNode with a limit estimates memory based on the input cardinality. Ran front-end and end-to-end tests affected by this change. Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Reviewed-on: http://gerrit.cloudera.org:8080/11806 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test 3 files changed, 99 insertions(+), 21 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 4 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 20:40:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 1: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/11816/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11816/1//COMMIT_MSG@9 PS1, Line 9: This patch adds the queuing status, that is, whether the query was Can you mention that this is showed when either live_summary or live_progress is true. http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift File common/thrift/ExecStats.thrift: http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift@104 PS1, Line 104: queued "is currently queued by admission control" to be totally clear http://gerrit.cloudera.org:8080/#/c/11816/1/common/thrift/ExecStats.thrift@107 PS1, Line 107: // Contains the latest queuing reason if the query is queued. Same as above. http://gerrit.cloudera.org:8080/#/c/11816/1/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/11816/1/shell/impala_shell.py@969 PS1, Line 969: if summary.is_queued: It's a little subtle that this handles both None and False. I'll leave it up to you whether this is worth documenting. -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 20:40:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3373/ -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 20:01:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/1197/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 19:47:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py File tests/custom_cluster/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@38 PS1, Line 38: proc = pexpect.spawn(' '.join([SHELL_CMD, "-i localhost:21000"])) I think it would help connecting to another port (i.e. another impalad) to ensure that the admission controller status is propagated across impalads. (Not sure if it'd make things flaky though). -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 19:24:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 3: Code-Review+1 Actually I should let michael take a final look too if he wants. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 19:17:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/11778/3/be/src/util/error-util.cc File be/src/util/error-util.cc: http://gerrit.cloudera.org:8080/#/c/11778/3/be/src/util/error-util.cc@231 PS3, Line 231: DCHECK(false) << "Unexpected hs2Code: " << hs2Code; clang-tidy failed because this didn't return a value. http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test: http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test@2203 PS2, Line 2203: set mem_limit=10m; > Looks like we don't propagate debug actions to child queries? Ah interesting. Maybe consider setting a lower mem_limit just to be sure that the error is reliable? The minimum memory limit can change depending on the cluster config and as we make changes to the code. -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 19:17:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11778 to look at the new patch set (#3). Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. IMPALA-7727: Fix TStatusCode to TErrorCode mapping - Uses a "GENERAL" TErrorCode type for all non-OK statuses. - Detailed regression root cause description in the jira IMPALA-7727. - Added a regression test. Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 --- M be/src/common/status.cc M be/src/common/status.h M be/src/service/child-query.cc M be/src/util/error-util.cc M be/src/util/error-util.h M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test 6 files changed, 35 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/11778/3 -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1196/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 19:07:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5031: memcpy cannot take null arguments
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11812 ) Change subject: IMPALA-5031: memcpy cannot take null arguments .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11812/1/be/src/runtime/string-buffer.h File be/src/runtime/string-buffer.h: http://gerrit.cloudera.org:8080/#/c/11812/1/be/src/runtime/string-buffer.h@93 PS1, Line 93: memcpy I see new_buffer is guarded here. Is this one not replaced bc buffer_ is assumed to not be null here? Haven't looked too closely, but it looks like buffer_ could be null. -- To view, visit http://gerrit.cloudera.org:8080/11812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9acc8c32409e67253a987eb3d1fd7d921efcb51 Gerrit-Change-Number: 11812 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 18:57:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7727: Fix TStatusCode to TErrorCode mapping
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11778 ) Change subject: IMPALA-7727: Fix TStatusCode to TErrorCode mapping .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11778/2/be/src/common/status.cc File be/src/common/status.cc: http://gerrit.cloudera.org:8080/#/c/11778/2/be/src/common/status.cc@a172 PS2, Line 172: > You missed removing the declaration in the header. oops will do. http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats.test: http://gerrit.cloudera.org:8080/#/c/11778/2/testdata/workloads/functional-query/queries/QueryTest/compute-stats.test@2203 PS2, Line 2203: set mem_limit=10m; > Can you use a debug_action to force the error directly? That would be more Looks like we don't propagate debug actions to child queries? void ChildQuery::SetQueryOptions(const TQueryOptions& parent_options, TExecuteStatementReq* exec_stmt_req) { map conf; #define QUERY_OPT_FN(NAME, ENUM, LEVEL)\ if (parent_options.__isset.NAME) {\ stringstream val;\ val << parent_options.NAME;\ conf[#ENUM] = val.str();\ } #define REMOVED_QUERY_OPT_FN(NAME, ENUM) QUERY_OPTS_TABLE #undef QUERY_OPT_FN #undef REMOVED_QUERY_OPT_FN // Ignore debug actions on child queries because they may cause deadlock. map::iterator it = conf.find("DEBUG_ACTION"); if (it != conf.end()) conf.erase(it); exec_stmt_req->__set_confOverlay(conf); } -- To view, visit http://gerrit.cloudera.org:8080/11778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62527734aa73c1524c731773638590bdac9e789 Gerrit-Change-Number: 11778 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 18:42:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11816 ) Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py File tests/custom_cluster/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@26 PS1, Line 26: class TestShellInteractive(CustomClusterTestSuite): flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/11816/1/tests/custom_cluster/test_shell_interactive.py@29 PS1, Line 29: flake8: E251 unexpected spaces around keyword / parameter equals -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 18:33:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7735: Expose queuing status in ExecSummary and impala-shell
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11816 Change subject: IMPALA-7735: Expose queuing status in ExecSummary and impala-shell .. IMPALA-7735: Expose queuing status in ExecSummary and impala-shell This patch adds the queuing status, that is, whether the query was queued and what was the latest queuing reason, to the ExecSummary. Also added changes to allow impala-shell to expose this status by pulling it out from the ExecSummary when either live_summary or live_progress is set to true. Testing: Added custom cluster tests. Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 --- M be/src/service/impala-server.cc M common/thrift/ExecStats.thrift M shell/impala_shell.py M tests/custom_cluster/test_admission_controller.py A tests/custom_cluster/test_shell_interactive.py 5 files changed, 100 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/11816/1 -- To view, visit http://gerrit.cloudera.org:8080/11816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibde447b01559b9f0f3e970d4fa10f6ee4064bd49 Gerrit-Change-Number: 11816 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-7765: [DOCS] Document IMPALA MAX MEM ESTIMATE FOR ADMISSION option
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11804 ) Change subject: IMPALA-7765: [DOCS] Document IMPALA_MAX_MEM_ESTIMATE_FOR_ADMISSION option .. Patch Set 2: Code-Review+2 (1 comment) Looks good. Just another small nit http://gerrit.cloudera.org:8080/#/c/11804/2/docs/topics/impala_max_mem_estimate_for_admission.xml File docs/topics/impala_max_mem_estimate_for_admission.xml: http://gerrit.cloudera.org:8080/#/c/11804/2/docs/topics/impala_max_mem_estimate_for_admission.xml@45 PS2, Line 45: on the memory estimates of a query on the planner's memory estimates for a query -- To view, visit http://gerrit.cloudera.org:8080/11804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibef89c98530c6974dc791666cc51c1ded52e7910 Gerrit-Change-Number: 11804 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 29 Oct 2018 17:29:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1195/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 2 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 17:17:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog
Greg Rahn has posted comments on this change. ( http://gerrit.cloudera.org:8080/11790 ) Change subject: IMPALA-7743: [DOCS] A new option to load incremental statistics from catalog .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/11790/2/docs/shared/impala_common.xml@1425 PS2, Line 1425: 3.0 and lower, for a table : with a big number of partitions and many columns, approximately 400 : bytes of metadata per column per partition, can add up to a significant : memory overhead as the metadata must be cached on the Could we make this a bit more clear and flow better? Something like (some parts left out for brevity): in 3.0 and lower approx 400 bytes is needed per column, per, partition, for caching. As a result, tables with a big number of partitions and many columns can add up to... -- To view, visit http://gerrit.cloudera.org:8080/11790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fd9b88138350406065df2f39a48043178759949 Gerrit-Change-Number: 11790 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni Gerrit-Reviewer: Adrian Ng (389) Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 17:08:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 16:45:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3374/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 16:45:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 2 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 16:44:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11806/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11806/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality > I think it would be better to file a separate JIRA for part 2. Part 1 is ki Done http://gerrit.cloudera.org:8080/#/c/11806/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test File testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test: http://gerrit.cloudera.org:8080/#/c/11806/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@5473 PS1, Line 5473: # IMPALA-7749: Aggregation with a limit should compute memory estimates based on input > Probably worth mentioning the JIRA for motivation Done -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 2 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 29 Oct 2018 16:42:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality
Pooja Nilangekar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/11806 ) Change subject: IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality .. IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality Prior to this change, the AggregationNode's perInstanceCardinality was influenced by the node's selectivity and limit. This was incorrect because the hash table is constructed over the entire input stream before any row batches are produced. This change ensures that the input cardinality is used to determine the perInstanceCardinality. Testing: Added a planner test which ensures that an AggregationNode with a limit estimates memory based on the input cardinality. Ran front-end and end-to-end tests affected by this change. Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test 3 files changed, 99 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/11806/2 -- To view, visit http://gerrit.cloudera.org:8080/11806 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc Gerrit-Change-Number: 11806 Gerrit-PatchSet: 2 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11794 ) Change subject: IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata .. Patch Set 4: > Patch Set 4: > > > Yes, I created some follow-up JIRAs for this. > > a) https://issues.apache.org/jira/browse/IMPALA-7764 > > This was very generic ("add unit tests.") We have a specific case here that > we need to add to our tests, which involves (if I'm reading things right) > removing the last permission from Sentry. That seems sensible to do as an > end-to-end test (in custom cluster?). > > Anyway--are we working on this right now? > > > b) https://issues.apache.org/jira/browse/IMPALA-7762 > > Thanks. It was a subtle bug and it was difficult to write an end-to-end test to reproduce this because we need to get the timing right. I did some end-to-end test for a similar issue but it was easier to reproduce: https://github.com/apache/impala/blob/master/tests/authorization/test_grant_revoke.py#L373-L393 Anyway, yes, I'm working on beefing up tests related to SentryProxy (which includes unit tests and end-to-end tests). I'll rename the subject in JIRA to make it clear. -- To view, visit http://gerrit.cloudera.org:8080/11794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1e0db2b1f727476f489c732c4f4e5bc1582429f Gerrit-Change-Number: 11794 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:22:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11794 ) Change subject: IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata .. Patch Set 4: > Yes, I created some follow-up JIRAs for this. > a) https://issues.apache.org/jira/browse/IMPALA-7764 This was very generic ("add unit tests.") We have a specific case here that we need to add to our tests, which involves (if I'm reading things right) removing the last permission from Sentry. That seems sensible to do as an end-to-end test (in custom cluster?). Anyway--are we working on this right now? > b) https://issues.apache.org/jira/browse/IMPALA-7762 Thanks. -- To view, visit http://gerrit.cloudera.org:8080/11794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1e0db2b1f727476f489c732c4f4e5bc1582429f Gerrit-Change-Number: 11794 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:16:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/11794 ) Change subject: IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata .. Patch Set 4: > Patch Set 4: > > Apologies for the late review: > > a) Can we add a test for this? > > b) Where was the hang? It seems like we should fail queries if they're > waiting for catalog updates past some threshold (5 minutes?) rather than hang > forever, since hanging forever is so much less pleasant to figure out. > > Thanks! Yes, I created some follow-up JIRAs for this. a) https://issues.apache.org/jira/browse/IMPALA-7764 b) https://issues.apache.org/jira/browse/IMPALA-7762 -- To view, visit http://gerrit.cloudera.org:8080/11794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1e0db2b1f727476f489c732c4f4e5bc1582429f Gerrit-Change-Number: 11794 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:14:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11794 ) Change subject: IMPALA-7760: Privilege version inconsistency causes a hang when running invalidate metadata .. Patch Set 4: Apologies for the late review: a) Can we add a test for this? b) Where was the hang? It seems like we should fail queries if they're waiting for catalog updates past some threshold (5 minutes?) rather than hang forever, since hanging forever is so much less pleasant to figure out. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/11794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1e0db2b1f727476f489c732c4f4e5bc1582429f Gerrit-Change-Number: 11794 Gerrit-PatchSet: 4 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Adam Holley Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:10:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:06:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3373/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 8 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:06:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@313 PS3, Line 313: long limit, long offset, boolean hasLimit, boolean disableTopN) > After discussing this a bit more with Tim, it seems that we generally avoid Sahil covered this, but just wanted to confirm that there was a deliberate decision not to pursue, in general, making planner decisions to fit a query into a memory limit. We do give up some opportunities to make more optimal decisions but it's definitely simpler for now. I don't recall seeing a case in practice where Top-N memory consumption caused a query failure but it seems like something that will happen eventually, and this change helps avoid it or at least gives an escape hatch that doesn't require rewriting the query. -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 16:02:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1194/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 15:31:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 ) Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/11698/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/11698/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@337 PS5, Line 337: limit + offset > Sounds like we just should be failing queries where offset + limit is > THR Filed IMPALA- as a follow up and removed the test changes for handling overflows. -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 29 Oct 2018 14:58:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries
Hello Lars Volker, Paul Rogers, Tim Armstrong, Impala Public Jenkins, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11698 to look at the new patch set (#7). Change subject: IMPALA-5004: Switch to sorting node for large TopN queries .. IMPALA-5004: Switch to sorting node for large TopN queries Adds a new query option 'topn_bytes_limit' that places a limit on the number of estimated bytes that a TopN operator can process. If the Impala planner estimates that a TopN operator will process more bytes than this limit, it will replace the TopN operator with a sort operator. Since the TopN operator cannot spill to disk, it has to buffer everything in memory. This can cause frequent OOM issues when running with a large limit + offset. Switching to a sort operator allows Impala to spill to disk. We prefer to use the TopN operator when possible as it has better performance than the sort operator for 'order by limit [offset]' queries. The default limit is set to 512MB and is based on micro-benchmarking the topn vs. sort operator for various limits (see the JIRA for full details). The default is set to an intentionally high value in order to avoid performance regressions. If the planner thinks the TopN operator will end up processing the entire input dataset, it falls back to a Sort operator. Since TopN will just end up sorting the entire dataset, there is no point in using it vs. the Sort operator (which has the advantage that it can spill to disk). Testing: * Added a new planner test to fuctional-planner/ to validate that 'topn_bytes_limit' properly switches between topn and sort operators. * Added a new planner test to functional-planner/ to validate that running an 'order by limit x' where x is the size of the input, triggers a sort instead of a TopN (even if the 'topn_bytes_limit' threshold has not been exceeded) Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/topn-bytes-limit-small.test A testdata/workloads/functional-planner/queries/PlannerTest/topn-bytes-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test 25 files changed, 260 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/11698/7 -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vuk Ercegovac