[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > See my other comment. In some cases, FastLimitCheckExceededRows is 0, e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0; http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 1 > I see. What I mean is to assure the value is greater than 0. Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#16). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 15: (2 comments) http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450 PS14, Line 450: > nit. 1? I modified the unit test because autotest failed.FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may contain 0 http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729 PS14, Line 2729: 0 > nit. Should it be 1-9 instead? Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, then check whether the number of rows returned more than limit, so FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may contain 0 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Sat, 18 Sep 2021 07:04:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#15). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 147 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/15 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134 PS13, Line 134: runtime_profile_->AddInfoString("FastLimitCheckExceededRows", : SimpleItoa(aggs_[0]->GetNumKeys() - limit())); : VLOG_QUERY << S > If we can add the info to runtime_profile_, it will be more useful. For exa Done http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446 PS13, Line 446: GINT > Can we also verify that some rows are indeed skipped in spill situation? Done http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726 PS13, Line 2726: speed up aggregations > Can we verify that most of the rows are indeed skipped fast? Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 14 Sep 2021 12:08:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#14). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 19 files changed, 149 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/14 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 14 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 13: (6 comments) http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10 PS12, Line 10: > nit. exceed the max line threshold. Done http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76 PS12, Line 76: VLOG_QUERY << Substitute("the number of rows ($0) returned from the aggregation" : " node has exceeded the limit of $1", aggs_[0]->GetNumKeys(), limit()); > nit. May use Substitute() which is faster. Done http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479 PS12, Line 479: compl > nit. complete Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646 PS12, Line 646: When both conditions below are true, aggregati > nit. I think we should mention the two conditions in the commit message her Done http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648 PS12, Line 648: on n > nit. May use Complete, as Halt implies stop for some reason. Done http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > Since we do not push down id from the outer side to the inner, I would thin where id = subquery,If this subQuery returns 2 rows, we can sure that it is not meet the semantic requirement, we should report a semantic error.We don't need to get all the results. -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#13). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 18 files changed, 142 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17821 ) Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. Patch Set 12: (8 comments) http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10 PS11, Line 10: When both con > nit. When both conditions below are true, Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h File be/src/exec/aggregation-node-base.h: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71 PS11, Line 71: bool fast_limit_check_ = false; > nit. May init to false here. Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc File be/src/exec/aggregation-node-base.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93 PS11, Line 93: : Status AggregationNodeBase::Prepare(RuntimeState* state) { : SCOPED_TIMER(runtime_profile_->total_time_counter()); : RETURN_IF_ERROR(ExecNode::Prepare(state)); : for (auto& agg : aggs_) { : agg->SetDebugOptions > I also wonder if fast_limit_check_ can be moved to the plan node for Aggreg Done Only GroupingAggregator has limit, maybe limit should not be added to TAggregator http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75 PS11, Line 75: os = true; : VLOG_QUERY << "the > nit. May make the message more specific by plugging in the numbers. For exa Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc File be/src/exec/streaming-aggregation-node.cc: http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129 PS11, Line 129: DCHECK(limit() > -1); > nit. We probably should DCHECK(limit() > -1) here before the IF. Done http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133 PS11, Line 133: child_batch_->Reset(); : VLOG_QUERY << "the > nit. same as the other one to make the message more specific. Done http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934 PS11, Line 2934: limit: 2 > nit. I wonder if this correct. On paper, we need to look at all distinct va This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a scalar subquery, the number of rows returned by this subquery cannot be greater than 1, so set limit = 2, and then check cardinality of result of subquery http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test File testdata/workloads/targeted-perf/queries/aggregation.test: http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731 PS11, Line 2731: > This violates condition 1? Done -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 10 Sep 2021 09:39:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#12). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. When both conditions below are true, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 18 files changed, 140 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/12 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#11). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Sample queries: SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Testing: Add test targeted-perf/queries/aggregation.test Pass core test Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 129 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/11 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 11 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#10). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 129 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/10 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#9). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 129 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/9 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 9 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#8). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 127 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/8 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 8 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#7). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 127 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/7 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#6). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 16 files changed, 127 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/6 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#5). Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some aggregations .. WIP IMPALA-2581: LIMIT can be propagated down into some aggregations This patch contains 2 parts: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/5 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#4). Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations .. [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/4 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17821 to look at the new patch set (#3). Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 128 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17821 Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations .. IMPALA-2581: LIMIT can be propagated down into some aggregations This patch does two things: 1. In some cases, push down limit to pre-aggregation a) aggregation node has no aggregate function b) aggregation node has no predicate 2. finish aggregation when number of unique keys of hash table has exceeded the limit. Queries like SELECT DISTINCT f FROM t LIMIT n Can pass the LIMIT all the way down to the pre-aggregation, which leads to a nearly unbounded speedup on these queries in large tables when n is low. Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 --- M be/src/exec/aggregation-node-base.cc M be/src/exec/aggregation-node-base.h M be/src/exec/aggregation-node.cc M be/src/exec/aggregator.h M be/src/exec/grouping-aggregator.cc M be/src/exec/grouping-aggregator.h M be/src/exec/non-grouping-aggregator.h M be/src/exec/streaming-aggregation-node.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test M testdata/workloads/targeted-perf/queries/aggregation.test 17 files changed, 127 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2 -- To view, visit http://gerrit.cloudera.org:8080/17821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995 Gerrit-Change-Number: 17821 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix catalogd restart brings stale metadata
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#26). Change subject: IMPALA-5476: Fix catalogd restart brings stale metadata .. IMPALA-5476: Fix catalogd restart brings stale metadata ImpaladCatalog#updateCatalog() doesn't trigger a full topic update request when detecting catalogServiceId changes. It just updates the local catalogServiceId and throws an exception to abort applying the DDL/DML results. This causes a problem when catalogd is restarted and the DDL/DML is executed on the restarted instance. In this case, only the local catalogServiceId is updated to the latest. The local catalog remains stale. Then when dealing with the following updates from statestore, the catalogServiceId always matches, so updates will be applied without exceptions. However, the catalog objects usually won't be updated since they have higher versions (from the old catalogd instance) than those in the update. This brings the local catalog out of sync until the catalog version of the new catalogd grows larger enough. Note that in dealing with the catalog updates from statestore, if the catalogServiceId unmatches, impalad will request a full topic update. See more in ImpalaServer::CatalogUpdateCallback(). This patch fixes this issue by checking the catalogServiceId before invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match the one in the DDL/DML result, wait until it changes. The following update from statestore will change it and unblocks the DDL/DML thread. Testing add several tests in tests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 150 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/26 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 26 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix catalogd restart brings stale metadata
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#25). Change subject: IMPALA-5476: Fix catalogd restart brings stale metadata .. IMPALA-5476: Fix catalogd restart brings stale metadata ImpaladCatalog#updateCatalog() doesn't trigger a full topic update request when detecting catalogServiceId changes. It just updates the local catalogServiceId and throws an exception to abort applying the DDL/DML results. This causes a problem when catalogd is restarted and the DDL/DML is executed on the restarted instance. In this case, only the local catalogServiceId is updated to the latest. The local catalog remains stale. Then when dealing with the following updates from statestore, the catalogServiceId always matches, so updates will be applied without exceptions. However, the catalog objects usually won't be updated since they have higher versions (from the old catalogd instance) than those in the update. This brings the local catalog out of sync until the catalog version of the new catalogd grows larger enough. Note that in dealing with the catalog updates from statestore, if the catalogServiceId unmatches, impalad will request a full topic update. See more in ImpalaServer::CatalogUpdateCallback(). This patch fixes this issue by checking the catalogServiceId before invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match the one in the DDL/DML result, wait until it changes. The following update from statestore will change it and unblocks the DDL/DML thread. Testing add several tests in tests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 150 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/25 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 25 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 24: (7 comments) http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@9 PS23, Line 9: The change of catalogServiceId did not trigger a full topic update : request. : : ImpaladCatalog#updateCatalog() doesn't trigger a full topic update : request when detecting catalogServiceId changes. It just updates the : local catalogServiceId and throws an exception to abort applying the : DDL/DML results. This causes a problem when catalogd is restarted and : the DDL/DML is executed on the restarted instance. In this case, only : the local catalogServiceId is updated to the latest. The local catalog : remains stale. Then when dealing with the following > nit: ImpaladCatalog#updateCatalog() doesn't trigger a full topic update req Done http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@20 PS23, Line 20: applied without exceptions. However, the catalog objects usually won't : be updated since they have higher versions (from the old catalogd : instance) than those in the update. This brings the local catalog out : of sync until the catalog version of the new catalogd grows larger : enough. > nit: Note that in dealing with the catalog updates from statestore, if the Done http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@25 PS23, Line 25: > Let's add some description about how we fix the bug: Done http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@28 PS23, Line 28: more in ImpalaServer::CatalogUpda > nit: several tests Done http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@29 PS23, Line 29: > nit: tests Done http://gerrit.cloudera.org:8080/#/c/17645/23/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/23/be/src/service/impala-server.cc@2146 PS23, Line 2146: // TODO Restart catalogd twice. When restarting for the second time, ddl executes : // successfully, but we may receive the service id of the first restart, which will : // cause the local cache to not be updated in time > nit: could you move these into the else-block at line 2164 and change them Done http://gerrit.cloudera.org:8080/#/c/17645/23/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/23/tests/custom_cluster/test_restart_services.py@230 PS23, Line 230: execute_query(query > Currently execute_query() and execute_query_async() are the same for DDLs. Done -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 24 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 20 Aug 2021 08:50:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#24). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. ImpaladCatalog#updateCatalog() doesn't trigger a full topic update request when detecting catalogServiceId changes. It just updates the local catalogServiceId and throws an exception to abort applying the DDL/DML results. This causes a problem when catalogd is restarted and the DDL/DML is executed on the restarted instance. In this case, only the local catalogServiceId is updated to the latest. The local catalog remains stale. Then when dealing with the following updates from statestore, the catalogServiceId always matches, so updates will be applied without exceptions. However, the catalog objects usually won't be updated since they have higher versions (from the old catalogd instance) than those in the update. This brings the local catalog out of sync until the catalog version of the new catalogd grows larger enough. Note that in dealing with the catalog updates from statestore, if the catalogServiceId unmatches, impalad will request a full topic update. See more in ImpalaServer::CatalogUpdateCallback(). This patch fixes this issue by checking the catalogServiceId before invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match the one in the DDL/DML result, wait until it changes. The following update from statestore will change it and unblocks the DDL/DML thread. Testing add several tests in tests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 152 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/24 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 24 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#23). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 144 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/23 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 23 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#22). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 144 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/22 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 22 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#21). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 143 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/21 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 21 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239 PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2) > While thinking about comments for this sleep, I realize this actually revea Done -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 20 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 19 Aug 2021 09:23:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#20). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 186 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/20 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 20 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#19). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 186 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/19 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 19 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 17: (3 comments) http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc@2129 PS16, Line 2129: TUniqueId cur_service_id; : { : unique_lock ver_lock(catalog_version_lock_); : cur_service_id = catalog_update_info_.catalog_service_id; : if (catalog_update_info_.catalog_service_id != catalog_service_id) { : LOG(INFO) << "Catalog service ID mismatch. Current ID: " : << PrintId(cur_service_id) << ". ID in response: " : << PrintId(catalog_service_id) << ". Catalogd may be restarted. Waitin > nit: Let's move these into the if-clause at line 2155. Done http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@217 PS16, Line 217: 10 > nit: Can we lower this down to 5? Done. down to 10 http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239 PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2) > While thinking about comments for this sleep, I realize this actually revea If the service id is monotonically increasing, then this problem is better solved. But the service id is not, maybe we can treat it as a todo -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 16 Aug 2021 03:19:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#17). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 146 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/17 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 17 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@208 PS10, Line 208: LOG.info(str(e)) : if i == 5: : self.cluster.catalogd.restart() : > statestored will send topic update request to catalogd once it's registered Done -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 06 Aug 2021 10:56:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#16). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 139 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/16 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 16 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#15). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 138 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/15 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 15 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 13: (6 comments) http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2150 PS10, Line 2150: g_version_update > nit: add a space after "." and ":" Done http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2167 PS10, Line 2167: > nit: add a space after each ":" Done http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2168 PS10, Line 2168: ocess the catalog update that covers the effects : // (catalog objects) of this o > nit: we can remove this if it's logging the same id. Done http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@186 PS10, Line 186: self.execute_query_expect_success(self.client, : "alter table join_aa add columns (name string)") > Could you add another test with this being executed with sync_ddl=true? Jus Done http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@208 PS10, Line 208: LOG.info(str(e)) : if i == 5: : self.cluster.catalogd.restart() : > I ran this test several times but I'm not able to see the WARNING log of "I I tried many times, but the WARNING log was not printed. It may be because of the following reason: after catalogd restarts successfully for the second time, catalogd will not immediately notify statestored to clear catalog topic entries. After waiting for "statestore_update_frequency_ms" milliseconds, catalogd receives the update request from statestored, and then catalogd notifies statestored to clear the catalog topic entries. Therefore, impalad has enough time to complete the first full update, which is triggered by the first restart of catalogd. http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@223 PS10, Line 223: Make the catalog object ve > nit: rename to "test_restart_catalogd_with_local_catalog" Done -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 26 Jul 2021 06:05:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#13). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 135 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/13 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 13 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#12). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 135 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/12 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 12 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#11). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 134 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/11 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 11 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@a2142 PS7, Line 2142: > Can we keep this? If we wait for the next statestore update, this won't cau Done http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2144 PS7, Line 2144: TUniqueId cur_service_id; > Could you add a log before the while-loop? E.g. Done http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2150 PS7, Line 2150: << ".ID in response:" << catalog_service_id << ". Catalogd may be restarted." > I think this captures the case that catalogd restart again and we get anoth The previous catalog service ID was not recorded, so I printed expected catalog service ID http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py@169 PS7, Line 169: def test_restart_catalogd(self): > This is already a good test case. Can we add another one to cover the case Done -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 22 Jul 2021 10:11:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#9). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 109 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/9 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 9 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#8). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 109 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/8 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 8 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 7: 16:50:03 ERROR: cannot verify jenkins.impala.io's certificate, issued by ‘CN=Go Daddy Secure Certificate Authority - G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\\, Inc.,L=Scottsdale,ST=Arizona,C=US’: 16:50:03 Issued certificate has expired. 16:50:03 To connect to jenkins.impala.io insecurely, use `--no-check-certificate'. 16:50:03 + ssh -p 29418 impala-public-jenk...@gerrit.cloudera.org gerrit review --verified -1 --message '"Build' failed: 'https://jenkins.impala.io/job/gerrit-verify-dryrun/7319/";' --project Impala-ASF 17645,7 16:50:05 + exit 1 16:50:05 Build step 'Execute shell' marked build as failure -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 21 Jul 2021 09:59:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#6). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 44 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/6 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#5). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 44 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/5 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py@189 PS4, Line 189: # Wait enough time to synchronize the metadata : sleep(20) > > We should not depend on sleep. This is what I mean in the previous I prefer Solution 2, this kind of modification is smaller and safer -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 16 Jul 2021 03:36:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py@189 PS4, Line 189: # Wait enough time to synchronize the metadata : sleep(20) > We should not depend on sleep. This is what I mean in the previous > comment: https://gerrit.cloudera.org/c/17645/1//COMMIT_MSG#17 > > When AlterTable finishes, the metadata of this table should be > up-to-date. The current patch can't satify this. I think of three solutions: Solution 1. Directly trigger the full topic update, but you have to wait until the full update is completed before you can update the local cache, because the local cache has a higher version. But I did not find the api to trigger the full update, ImpalaServer::CatalogUpdateCallback is not applicable Solution 2. When restart catalogs, delay DDL execution time. After the ddl is executed, determine whether the catalog service id of the returned result is the same as the local one. If it is different, wait for the local catalog service id to change, and then return success Solution 3. When call exec_env_->frontend()->UpdateCatalogCache, if the catalog server id is different, the catalog object will be updated directly without comparing the versions -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 16 Jul 2021 03:20:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#4). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Testing add TestRestart#test_restart_catalogd in ests/custom_cluster/test_restart_services.py Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 29 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/4 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 3: add unit test -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 13 Jul 2021 11:52:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#3). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc M tests/custom_cluster/test_restart_services.py 2 files changed, 29 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/3 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG@16 PS1, Line 16: but did not submit full topic update : request > Shouldn't we fix this instead? The new catalog objects in the DDL > response come from the new catalogd instance. Will they be able to > update the local catalog cache? Or are they just got ignored due to > versions too small? we should fix this bug.The new catalog objects in the DDL response come from the new catalogd instance, they just got ignored due to versions too small > > I think updating the local catalog cache immediately using the DDL > response aims to let the coordinator get latest catalog > immediately. After this patch, the coordinator will need to wait > for the next statestore topic update to get the latest catalog, > which doesn't match the existing functionality. Did I miss > anything? This patch do not change the previous process. After the ddl is successfully executed, impalad will immediately modify the local catalog cache instead of waiting for the statestore notification. When the ddl is executed successfully, it is reasonable for impalad to modify the local cache, but it is redundant to modify the service id every time, and it will cause the object version to be very high, causing the local catalog cache to be out of sync http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG@24 PS1, Line 24: CatalogD > CatalogD? Done http://gerrit.cloudera.org:8080/#/c/17645/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/17645/1/be/src/service/impala-server.cc@a2133 PS1, Line 2133: : : : > Is a better solution to set the catalogServiceId_ only if firstRun is true Modifying the server id here may be redundant. Is it necessary to do so? -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 13 Jul 2021 03:30:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17645 to look at the new patch set (#2). Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of CatalogD is greater than Impalad. Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc 1 file changed, 1 insertion(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/2 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17637 ) Change subject: IMPALA-10766: Better selectivity for =,not distinct .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/17637/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17637/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@288 PS6, Line 288: selectivity_ = selectivity_ * (double) (numRows - numNulls) / numRows + : numNulls / numRows; > nit. In this case, can we use selectivity = #nulls / #rows directly? This calculation formula may be (1- 1/ndv)* (numRows - numNulls) / numRows + numNulls / numRows. Some non-null values satisfy "is distinct from not-null". ie. 'aa' is distinct from 'bb' -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 13 Jul 2021 02:36:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17637 ) Change subject: IMPALA-10766: Better selectivity for =,not distinct .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@250 PS5, Line 250: rChildIsNull > nit. Do we need to worry about left child is null here, or not since the le Considering this special situation, the selectivity will be more accurate. http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@280 PS5, Line 280: // For =, !=, "is distinct from null" and "is not distinct from non-null", : // all null values are false. : selectivity_ *= (double) (numRows - numNulls) / numRows; > nit. may combine them into a single sentence: Done http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@284 PS5, Line 284: stinct from null, only null values are > Another case here is "is distinct from not-null". Done -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 12 Jul 2021 08:40:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17637 to look at the new patch set (#6). Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Testing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test 32 files changed, 1,203 insertions(+), 1,186 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/6 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17637 ) Change subject: IMPALA-10766: Better selectivity for =,not distinct .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17637/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17637/4//COMMIT_MSG@20 PS4, Line 20: Testin > Testing Done http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@253 PS4, Line 253: ! > != Done http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@283 PS4, Line 283: (numRows - numNulls) / numRows; > nit: I think it's worth to extract it to a variable just like numRows Done -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 09 Jul 2021 09:52:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17637 to look at the new patch set (#5). Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Testing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test 32 files changed, 1,198 insertions(+), 1,185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/5 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Patch Set 1: When updating the local cache, catalogServiceId should not be changed here, because the full update will not be triggered here -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 06 Jul 2021 11:24:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has removed Tim Armstrong from this change. ( http://gerrit.cloudera.org:8080/17645 ) Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. Removed reviewer Tim Armstrong. -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17645 Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync .. IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync The change of catalogServiceId did not trigger a full topic update request. During the execution of DDL or DML, restarting Catalogd will cause catalogServiceId to change. The DDL will execute successfully, and then directly update the local cache (not through Statestored), modify the local catalog with the new catalogServiceId, trigger an exception, print the exception information, but did not submit full topic update request. The version of Catalogd is lower than Impalad, so the metadata information synchronized from Catalogd will be lost. Under normal circumstances, if the catalogServiceId changes, Impalad will submit full topic update request, Impalad will receive TUpdateCatalogCacheRequest, and is_delta is false, that is full update. Impalad will reset the local catalog cache, and then update, and the version of DatalogD is greater than Impalad. Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c --- M be/src/service/impala-server.cc 1 file changed, 1 insertion(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/1 -- To view, visit http://gerrit.cloudera.org:8080/17645 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c Gerrit-Change-Number: 17645 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17637 to look at the new patch set (#4). Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Tesing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test 32 files changed, 1,198 insertions(+), 1,185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/4 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17637 to look at the new patch set (#3). Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Tesing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test 32 files changed, 1,188 insertions(+), 1,173 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/3 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17637 to look at the new patch set (#2). Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Tesing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 6 files changed, 70 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/2 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17637 Change subject: IMPALA-10766: Better selectivity for =,not distinct .. IMPALA-10766: Better selectivity for =,not distinct For = : If the right side is null, then selectivity is 0. If the left side is null, null should be excluded when calculating selectivity. For is not distinct from : If the right side is null, non null should be excluded when calculating selectivity, and only null should be included. If the left side is null and the right side is not null, null should be excluded when calculating selectivity, including part of non-null. Tesing : Change the UT, modify the selectivity calculation error, add two new cases column != null and column = null Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 6 files changed, 70 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/1 -- To view, visit http://gerrit.cloudera.org:8080/17637 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60 Gerrit-Change-Number: 17637 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#10). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception There is a read-write contention in FunctionName#analyze() when a FunctionName object is shared to two threads: check if there is a builtin function in wrong database, when 'db_' is temporarily modified by the other thread. This patch fix this issue. For analyze FunctionName, if no database is specified, check builtinDb first; if FunctionName is not in builtinDb, set the database to defaultDb.Avoid setting the database of FunctionName to a temporary value Detailed description: This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set.But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defaultDb, and then user B will continue to check whether the function is defaultDb, Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionName.java 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/10 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 10 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#8). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception There is a read-write contention in FunctionName#analyze() when a FunctionName object is shared to two threads: check if there is a builtin function in wrong database, when 'db_' is temporarily modified by the other thread. This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defaultDb, and then user B will continue to check whether the function is defaultDb, Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionName.java 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/8 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 8 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#7). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception There is a read-write contention in FunctionName#analyze() when a FunctionName object is shared to two threads: check if there is a builtin function in wrong database, when 'db_' is temporarily modified by the other thread. This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defaultDb, and then user B will continue to check whether the function is defaultDb, Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java 2 files changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/7 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17604 ) Change subject: IMPALA-9763: builtin functions throw unknown exception .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17604/6/fe/src/main/java/org/apache/impala/analysis/FunctionName.java File fe/src/main/java/org/apache/impala/analysis/FunctionName.java: http://gerrit.cloudera.org:8080/#/c/17604/6/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@144 PS6, Line 144: if (!isFullyQualified()) { : db_ = analyzer.getDefaultDb(); : if (preferBuiltinsDb && builtinDb.containsFunction(fn_)) { : db_ = BuiltinsDb.NAME; : } : } : Preconditions.checkNotNull(db_); : isBuiltin_ = db_.equals(BuiltinsDb.NAME) && : builtinDb.containsFunction(fn_); > I think a simpler solution is resolving the read-write contention here, i.e This is indeed a simple and conservative method. However, the risk of shallow copy still exists, and this can be resolved later. If the database is not specified, check builtDb first, if not, set it to defaultDb. I'll modify it like this first, and then test it? -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 24 Jun 2021 02:50:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17604 ) Change subject: IMPALA-9763: builtin functions throw unknown exception .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/17604/5/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/17604/5/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@274 PS5, Line 274: orderByElement.getExpr().collectAll( : Predicates.instanceOf(FunctionCallExpr.class), fnExprs); : } : substituteOrdinalsAndAliases(orderingExprs, "ORDER BY", analyzer); : : for (FunctionCallExpr fn: fnExprs) fn.getFnName().analyze(analyzer); > I'm not clear about this. You are collecting fnExprs from the > original 'orderByElements_' but not the cloned ones inside > 'orderingExprs'. And then analyze these original exprs. > > Do we require all FunctionName objects being analyzed now? Just > concerning that it will introduce a new requirement that may be > broken in many other places. Previously, the code here relied on shallow copy. When executing substituteOrdinalsAndAliases(orderingExprs, "ORDER BY", analyzer), the FunctionName in orderingExprs was analyzed and the FunctionName in orderByElements_ was also analyzed. After changing to a deep copy, if the FunctionName in orderByElements_ is not analyzed here, then NormalizeCountStarRule#apply will report an exception: NullPointorException, because the FunctionName in orderByElements_ has not been analyzed, origExpr.getFnName().getFunction() returns Is null(line 41). -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 23 Jun 2021 09:37:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17604 ) Change subject: IMPALA-9763: builtin functions throw unknown exception .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17604/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/17604/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@173 PS4, Line 173: Preconditions.checkState(other.fnName_ != null); > nit: Could you add a Preconditions check that other.fnName_ != null? Done http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java File fe/src/main/java/org/apache/impala/analysis/FunctionName.java: http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@64 PS3, Line 64: fnNamePath_ = (other.getFnNamePath() != null) ? > Sorry, I'm still not clear why we need to clone a field that is > never modified... Did I miss anything? Only fnNamePath_ is saved in StmtTableCache, db_, fn_ are not saved, isBuiltin_ and isAnalyzed_ are both false http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@66 PS3, Line 66: db_ = other.getDb(); : fn_ = other.getFunction(); : isBuiltin_ = other.isBuiltin(); : isAnalyzed_ = other.isFnAnalyzed(); > > I tried to do this, but an error occurred. Because > FunctionName::fnNamePath_ is null. > > > >use functional; > >Select count(1) from alltypesagg; > > > >I0621 19:03:29.787499 46677 jni-util.cc:286] > >0649db5f6bc675d6:6b92f3b6] > java.lang.NullPointerException > >at > >org.apache.impala.analysis.FunctionCallExpr.isConstantImpl(FunctionCallExpr.java:372) > >at org.apache.impala.analysis.Expr.isConstant(Expr.java:1460) > >at org.apache.impala.analysis.Expr.computeNumDistinctValues(Expr.java:575) > >at org.apache.impala.analysis.Expr.analyze(Expr.java:500) > >.. > > hmm, did you reset fnNamePath_ to null so it cause the > NullPointerException? In reset() we only reset the analysis results > which are db_, fn_, isBuiltin_ and isAnalyzed_. The fnNamePath_ > comes from the parser so should not be reset. > > > I modified the copy constructor of FunctionCallExpr again. I > clone QueryStmt in the constructor of InlineViewRef. At this time, > db_ and fn_ are both null, and isBuiltin_ and isAnalyzed_ are both > false, because only fnNamePath_ is stored in the stmtTableCache, > and no other values are stored. > > OK, this answers my question that cloning the analysis results are > safe, because the source objects are in the StmtTableCache and they > are never analyzed. All code paths go here will see 'other' has > null db_ , fn_ etc. > However, this depends on the caller implementation far away from > this class. I think adding a reset() method to reset db_, fn_, > isBuiltin_ and isAnalyzed_ help to prevent future bugs. I added the reset() function, but I didn’t call it in FunctionCallExpr#resetAnalysisState for the following reasons: 1. FunctionName is created in some places, only fn_ is set, fnNamePath_ is not set, if reset is called, both fn_ and fnNamePath_ are null, for example, AggregateInfo#createCountDistinctAggExprParam,NormalizeCountStarRule#apply 2. In some places, fnNamePath_ is set, but if reset is called, FunctionName will continue to be used without analysis. -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 22 Jun 2021 07:44:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#5). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception There is a read-write contention in FunctionName#analyze() when a FunctionName object is shared to two threads: check if there is a builtin function in wrong database, when 'db_' is temporarily modified by the other thread. This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defaultDb, and then user B will continue to check whether the function is defaultDb, Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java 3 files changed, 28 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/5 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-7560: Set selectivity of Not-equal .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@256 PS6, Line 256: selectivity_ = 1.0 / distinctValues; > I think if we have null stats, the selectivity for EQ case should be comput Maybe I submit another patch to modify the selectivity of EQ to be better. This patch only modifies the selectivity of NE http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@258 PS6, Line 258: // > nit. May add a comment here: For case IS DISTINCT FROM NULL Done -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Tue, 22 Jun 2021 02:44:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#7). Change subject: IMPALA-7560: Set selectivity of Not-equal .. IMPALA-7560: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col != 5", but not "2 * col != 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 110 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/7 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17604 ) Change subject: IMPALA-9763: builtin functions throw unknown exception .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@27 PS3, Line 27: me time. > nit: defaultDb Done http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@28 PS3, Line 28: getDefault > builtin? Done http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@30 PS3, Line 30: FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze; > I think you can add a summary that there is a read-write contention in Func Done http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@173 PS3, Line 173: Name_.clone(); > nit: can we change this to isFnAnalyzed()? It's more readable for me. Yes, the readability here is not good. But I think it might be better to drop the shallow copy and modify the code that relies on the shallow copy. http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java File fe/src/main/java/org/apache/impala/analysis/FunctionName.java: http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@64 PS3, Line 64: fnNamePath_ = (other.getFnNamePath() != null) ? > I think we don't need to clone this since it's private and never modified. I removed the "shallow copy" in the copy constructor of FunctionCallExpr, so I need to copy fnNamePath_ http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@66 PS3, Line 66: db_ = other.getDb(); : fn_ = other.getFunction(); : isBuiltin_ = other.isBuiltin(); : isAnalyzed_ = other.isFnAnalyzed(); > Is it safe to clone the analysis results? E.g. if two dbs both have a foo() I tried to do this, but an error occurred. Because FunctionName::fnNamePath_ is null. use functional; Select count(1) from alltypesagg; I0621 19:03:29.787499 46677 jni-util.cc:286] 0649db5f6bc675d6:6b92f3b6] java.lang.NullPointerException at org.apache.impala.analysis.FunctionCallExpr.isConstantImpl(FunctionCallExpr.java:372) at org.apache.impala.analysis.Expr.isConstant(Expr.java:1460) at org.apache.impala.analysis.Expr.computeNumDistinctValues(Expr.java:575) at org.apache.impala.analysis.Expr.analyze(Expr.java:500) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyzeSelectClause(SelectStmt.java:341) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:280) at org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:265) at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:258) at org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:575) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:549) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:445) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1669) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1635) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1605) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:163) I modified the copy constructor of FunctionCallExpr again. I clone QueryStmt in the constructor of InlineViewRef. At this time, db_ and fn_ are both null, and isBuiltin_ and isAnalyzed_ are both false, because only fnNamePath_ is stored in the stmtTableCache, and no other values are stored. http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@72 PS3, Line 72: @Override > nit: Could you add an "@Override" annotation? Done -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Mon, 21 Jun 2021 11:53:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Quanlong Huang, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#4). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception There is a read-write contention in FunctionName#analyze() when a FunctionName object is shared to two threads: check if there is a builtin function in wrong database, when 'db_' is temporarily modified by the other thread. This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defaultDb, and then user B will continue to check whether the function is defaultDb, Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java 3 files changed, 20 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/4 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#3). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defultdb, and then user B will continue to check whether the function is pure time , Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/3 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17604 to look at the new patch set (#2). Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defultdb, and then user B will continue to check whether the function is pure time , Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java 2 files changed, 14 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/2 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17604 Change subject: IMPALA-9763: builtin functions throw unknown exception .. IMPALA-9763: builtin functions throw unknown exception This problem is caused by shallow copies. In the function FromClause::analyze(Analyzer analyzer), if the table is FeView, the analyzer will clone SelectStmt from the view's MetaData, which should be a deep copy. But in the copy constructor of FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy, which makes the fnName_ of multiple queries refer to the same block of memory. When the query is analyzed, the database of fnName_ will be set. But this will change the fnName_ of other queries. For example: create view1 as select coalesce(col1, 2) from table1; User A and User B check the same view1 at the same time. When user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of FunctionName::analyze. User B’s query is also analyzed and runs to FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze; at this time, the step of user A's db_ = analyzer.getDefaultDb() will change the db_ attribute of user B's fnName_ to defultdb, and then user B will continue to check whether the function is pure time , Throw an exception: coalesce() unknown for database ... Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/1 -- To view, visit http://gerrit.cloudera.org:8080/17604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a Gerrit-Change-Number: 17604 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-7560: Set selectivity of Not-equal .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG@7 PS4, Line 7: IMPALA-7560: > You might want to use IMPALA-7560 as this was the original Jira and it cont Done http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG@11 PS4, Line 11: col != 5", but not "2 * col != > nit: wouldn't be better to use != in the example? Done http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257 PS3, Line 257: || op_ == Ope > SlotRef retrieves the NDV from the SlotDescriptor's ColumnStats in SlotRef. Done http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@a372 PS4, Line 372: > That seems wrong looking at the data. I think it should be 1. Done http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@344 PS4, Line 344: > NULL values are not included in the NDV, also null count is stored separate Done http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@355 PS4, Line 355: 0); > I think the expected value is 0 here, as null_str is all nulls. Done http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@359 PS4, Line 359: some_nulls is not distinct from 'foo'", > I think it'd be worth to add a test case with Done -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 18 Jun 2021 07:30:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#6). Change subject: IMPALA-7560: Set selectivity of Not-equal .. IMPALA-7560: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col != 5", but not "2 * col != 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 109 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/6 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#5). Change subject: IMPALA-7560: Set selectivity of Not-equal .. IMPALA-7560: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col != 5", but not "2 * col != 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 108 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/5 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-10677: Set selectivity of Not-equal .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257 PS3, Line 257: distinctValues > I wonder if distinctValues includes the null. If so, we may need to compute If the column contains null, but not all null, then distinctValues does not contain null -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 04 Jun 2021 03:40:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-10677: Set selectivity of Not-equal .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257 PS3, Line 257: distinctValues > I wonder if distinctValues includes the null. If so, we may need to compute If all values of the column are null, the distinctValues is 1; if not, the distinctValues does not includes null http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@70 PS3, Line 70: verifyCardinality( : "SELECT id FROM functional.alltypes WHE > Should we remove this comment, since the calculation is correct with the pa Done http://gerrit.cloudera.org:8080/#/c/17344/3/testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test File testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test: http://gerrit.cloudera.org:8080/#/c/17344/3/testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test@160 PS3, Line 160: from functional.alltypestiny > Remove? Done -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Fri, 04 Jun 2021 03:37:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#4). Change subject: IMPALA-10677: Set selectivity of Not-equal .. IMPALA-10677: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col = 5", but not "2 * col = 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 91 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/4 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10696: fix accuracy problem
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17411 Change subject: IMPALA-10696: fix accuracy problem .. IMPALA-10696: fix accuracy problem Table alltypes has no statistics, so the cardinality of alltypes will be estimated based on the hdfs files and the avg row size. Calling PrintUtils.printMetric, double will be divided by long. There will be accuracy problems. In most cases, the number of lines calculated is 17.91 K. But due to accuracy problems here, the calculated value is 17.90K. I modified line 221 of stats-extrapolation.test and used row_regex to match, referring to the matching method of cardinality in line 224,in this case, their values are the same Testing: metadata/test_stats_extrapolation.py Change-Id: I0a1a3809508c90217517705b2b188b2ccba6f23f --- M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17411/1 -- To view, visit http://gerrit.cloudera.org:8080/17411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0a1a3809508c90217517705b2b188b2ccba6f23f Gerrit-Change-Number: 17411 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
Hello Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#3). Change subject: IMPALA-10677: Set selectivity of Not-equal .. IMPALA-10677: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col = 5", but not "2 * col = 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 13 files changed, 91 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/3 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 7: compare Patch Set 4 and Patch Set 7 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 29 Apr 2021 04:36:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@461 PS4, Line 461: // incremental statistics size = Existing partition statistics : // - Repeated calculation partition stats : // + This time calculation partition stats : for (FeFsPartition part: allPartitions) { : / > In the commit message you have the following formula: Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@467 PS4, Line 467: // - Repeated calculation partition stats" > Please add comment that this is the "This time calculation partition stats" Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@471 PS4, Line 471: > nit: stats Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1787 PS4, Line 1787: checkComputeStatsStmt( : "compute incremental stats functional.alltypes partition(year=2010, month=10)"); : > Please add a few more tests where you invoke compute incremental stats for Done -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 29 Apr 2021 02:23:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17344 ) Change subject: IMPALA-10677: Set selectivity of Not-equal .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@244 PS1, Line 244: > This condition can be checked at the beginning together with !isSingleColum Done http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@249 PS1, Line 249: to selectivity is 0. > Repetition in both then and else branch. Done http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@316 PS1, Line 316: testNeSelectivity > May add a couple test cases for an empty table. Done -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 28 Apr 2021 11:46:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
Hello Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17344 to look at the new patch set (#2). Change subject: IMPALA-10677: Set selectivity of Not-equal .. IMPALA-10677: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col = 5", but not "2 * col = 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 12 files changed, 89 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/2 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#7). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 66 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/7 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 7 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@461 PS4, Line 461: // incremental statistics size = Existing partition statistics : // - Repeated calculation partition stats : // + This time calculation partition stats : for (FeFsPartition part: allPartitions) { : / > In the commit message you have the following formula: Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@467 PS4, Line 467: // - Repeated calculation partition stats" > Please add comment that this is the "This time calculation partition stats" Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@471 PS4, Line 471: > nit: stats Done http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1787 PS4, Line 1787: checkComputeStatsStmt( : "compute incremental stats functional.alltypes partition(year=2010, month=10)"); : > Please add a few more tests where you invoke compute incremental stats for Done -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 28 Apr 2021 06:39:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#6). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 65 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/6 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 6 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#5). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 66 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/5 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 5 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal
liuyao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17344 Change subject: IMPALA-10677: Set selectivity of Not-equal .. IMPALA-10677: Set selectivity of Not-equal Calculate binary predicate selectivity if one of the children is a slotref and the other children are all constant. eg. something like "col = 5", but not "2 * col = 10" selectivity = 1 - 1/ndv Testing: Modify the function testNeSelectivity() of the ExprCardinalityTest.java, change -1 to the correct value. Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a --- M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test 12 files changed, 60 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/1 -- To view, visit http://gerrit.cloudera.org:8080/17344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a Gerrit-Change-Number: 17344 Gerrit-PatchSet: 1 Gerrit-Owner: liuyao
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784 PS3, Line 1784: fu > nit. table alltypes? Done -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Thu, 15 Apr 2021 02:27:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#4). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 45 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/4 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 4 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
liuyao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17299 ) Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@448 PS2, Line 448: numOfAllIncStats > May rename the variable to numOfAllIncStatsPartitions Done http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS2, Line 452: if (partitionSet_ == null) { : numOfAllIncStatsPartitions = allPartitio > We may not need to verify the size limit when the partition set for increme It is a pre-check on the size of incremental stats to prevent the incremental stats from occupying too much memory after calculation. If no partition is specified, all partitions are calculated. Need to check whether the incremental stats of all partitions exceeds the threshold after calculation. -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: liuyao Gerrit-Comment-Date: Wed, 14 Apr 2021 06:18:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#3). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 45 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/3 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 3 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17299 to look at the new patch set (#2). Change subject: IMPALA-10652: Optimize the checking of the size of incremental stats .. IMPALA-10652: Optimize the checking of the size of incremental stats Modify the estimation method of incremental statistics size: incremental statistics size = Existing partition statistics + This time calculation partition stats - Repeated calculation partition stats Testing: All partitions of a table have no incremental stats. --Calculate the incremental stats of all partitions, the incremental stats size exceeds the threshold, an error is reported. --Calculate the incremental stats of one partition, no error is reported. Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee --- M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 2 files changed, 45 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/2 -- To view, visit http://gerrit.cloudera.org:8080/17299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee Gerrit-Change-Number: 17299 Gerrit-PatchSet: 2 Gerrit-Owner: liuyao Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong