[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> See my other comment.
In some cases, FastLimitCheckExceededRows is 0,
e.g. SET debug_action=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0;


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 1
> I see. What I mean is to assure the value is greater than 0.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 22 Sep 2021 04:02:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-21 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#16).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/16
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/functional-query/queries/QueryTest/spilling.test@450
PS14, Line 450:
> nit. 1?
I modified the unit test because autotest failed.FastLimitCheckExceededRows can 
be unidigit, double digits, three digits, so may contain 0


http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/14/testdata/workloads/targeted-perf/queries/aggregation.test@2729
PS14, Line 2729: 0
> nit. Should it be 1-9 instead?
Usually, rowbatch contains 1024 rows, stream agg processeed a rowbatch, then 
check whether the number of rows returned more than limit, so 
FastLimitCheckExceededRows can be unidigit, double digits, three digits, so may 
contain 0



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Sat, 18 Sep 2021 07:04:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-18 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#15).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 147 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/15
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/13/be/src/exec/streaming-aggregation-node.cc@134
PS13, Line 134:  runtime_profile_->AddInfoString("FastLimitCheckExceededRows",
  :   SimpleItoa(aggs_[0]->GetNumKeys() - limit()));
  :   VLOG_QUERY << S
> If we can add the info to runtime_profile_, it will be more useful. For exa
Done


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/functional-query/queries/QueryTest/spilling.test@446
PS13, Line 446: GINT
> Can we also verify that some rows are indeed skipped in spill situation?
Done


http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/13/testdata/workloads/targeted-perf/queries/aggregation.test@2726
PS13, Line 2726:  speed up aggregations
> Can we verify that most of the rows are indeed skipped fast?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 14 Sep 2021 12:08:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-14 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#14).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
19 files changed, 149 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/14
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 14
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/12//COMMIT_MSG@10
PS12, Line 10:
> nit. exceed the max line threshold.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/12/be/src/exec/aggregation-node.cc@76
PS12, Line 76:  VLOG_QUERY << Substitute("the number of rows ($0) returned from 
the aggregation"
 :   " node has exceeded the limit of $1", 
aggs_[0]->GetNumKeys(), limit());
> nit. May use Substitute() which is faster.
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/17821/12/common/thrift/PlanNodes.thrift@479
PS12, Line 479: compl
> nit. complete
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@646
PS12, Line 646: When both conditions below are true, aggregati
> nit. I think we should mention the two conditions in the commit message her
Done


http://gerrit.cloudera.org:8080/#/c/17821/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@648
PS12, Line 648: on n
> nit. May use Complete, as Halt implies stop for some reason.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> Since we do not push down id from the outer side to the inner, I would thin
where id = subquery,If this subQuery returns 2 rows, we can sure that it is not 
meet the semantic requirement, we should report a semantic error.We don't need 
to get all the results.



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 13 Sep 2021 06:40:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-13 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#13).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
18 files changed, 142 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/13
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17821 )

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17821/11//COMMIT_MSG@10
PS11, Line 10: When both con
> nit. When both conditions below are true,
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h
File be/src/exec/aggregation-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.h@71
PS11, Line 71:   bool fast_limit_check_ = false;
> nit. May init to false here.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc
File be/src/exec/aggregation-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node-base.cc@93
PS11, Line 93:
 : Status AggregationNodeBase::Prepare(RuntimeState* state) {
 :   SCOPED_TIMER(runtime_profile_->total_time_counter());
 :   RETURN_IF_ERROR(ExecNode::Prepare(state));
 :   for (auto& agg : aggs_) {
 : agg->SetDebugOptions
> I also wonder if fast_limit_check_ can be moved to the plan node for Aggreg
Done Only GroupingAggregator has limit, maybe limit should not be added to 
TAggregator


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc
File be/src/exec/aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/aggregation-node.cc@75
PS11, Line 75: os = true;
 :   VLOG_QUERY << "the
> nit. May make the message more specific by plugging in the numbers. For exa
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc
File be/src/exec/streaming-aggregation-node.cc:

http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@129
PS11, Line 129: DCHECK(limit() > -1);
> nit. We probably should DCHECK(limit() > -1) here before the IF.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/be/src/exec/streaming-aggregation-node.cc@133
PS11, Line 133: child_batch_->Reset();
  :   VLOG_QUERY << "the
> nit. same as the other one to make the message more specific.
Done


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test@2934
PS11, Line 2934: limit: 2
> nit. I wonder if this correct. On paper, we need to look at all distinct va
This is an SQL optimization. Look at SubqueryRewriter# mergeExpr. This is a 
scalar subquery, the number of rows returned by this subquery cannot be greater 
than 1, so set limit = 2, and then check cardinality of result of subquery


http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test
File testdata/workloads/targeted-perf/queries/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/17821/11/testdata/workloads/targeted-perf/queries/aggregation.test@2731
PS11, Line 2731:
> This violates condition 1?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 10 Sep 2021 09:39:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-10 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#12).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. When both conditions below are true, push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
18 files changed, 140 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/12
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-09 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#11).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 129 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/11
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 11
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-02 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#10).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 129 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/10
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-01 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#9).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 129 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/9
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 9
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-01 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#8).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 127 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/8
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 8
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-09-01 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#7).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 127 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/7
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#6).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
16 files changed, 127 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/6
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#5).

Change subject: WIP IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

WIP IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch contains 2 parts:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/5
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] [WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#4).

Change subject: [WIP]IMPALA-2581: LIMIT can be propagated down into some 
aggregations
..

[WIP]IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/4
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17821

to look at the new patch set (#3).

Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 128 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/3
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-2581: LIMIT can be propagated down into some aggregations

2021-08-31 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17821


Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
..

IMPALA-2581: LIMIT can be propagated down into some aggregations

This patch does two things:
1. In some cases,  push down limit to pre-aggregation
 a) aggregation node has no aggregate function
 b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.

Queries like

SELECT DISTINCT f FROM t LIMIT n

Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.

Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
---
M be/src/exec/aggregation-node-base.cc
M be/src/exec/aggregation-node-base.h
M be/src/exec/aggregation-node.cc
M be/src/exec/aggregator.h
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/non-grouping-aggregator.h
M be/src/exec/streaming-aggregation-node.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-query/queries/QueryTest/spilling.test
M testdata/workloads/targeted-perf/queries/aggregation.test
17 files changed, 127 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17821/2
--
To view, visit http://gerrit.cloudera.org:8080/17821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Gerrit-Change-Number: 17821
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix catalogd restart brings stale metadata

2021-08-20 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#26).

Change subject: IMPALA-5476: Fix catalogd restart brings stale metadata
..

IMPALA-5476: Fix catalogd restart brings stale metadata

ImpaladCatalog#updateCatalog() doesn't trigger a full topic update
request when detecting catalogServiceId changes. It just updates the
local catalogServiceId and throws an exception to abort applying the
DDL/DML results. This causes a problem when catalogd is restarted and
the DDL/DML is executed on the restarted instance. In this case, only
the local catalogServiceId is updated to the latest. The local catalog
remains stale. Then when dealing with the following updates from
statestore, the catalogServiceId always matches, so updates will be
applied without exceptions. However, the catalog objects usually won't
be updated since they have higher versions (from the old catalogd
instance) than those in the update. This brings the local catalog out
of sync until the catalog version of the new catalogd grows larger
enough.

Note that in dealing with the catalog updates from statestore, if the
catalogServiceId unmatches, impalad will request a full topic update.
See more in ImpalaServer::CatalogUpdateCallback().

This patch fixes this issue by checking the catalogServiceId before
invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match
the one in the DDL/DML result, wait until it changes. The following
update from statestore will change it and unblocks the DDL/DML thread.

Testing

add several tests in
tests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 150 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/26
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 26
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix catalogd restart brings stale metadata

2021-08-20 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#25).

Change subject: IMPALA-5476: Fix catalogd restart brings stale metadata
..

IMPALA-5476: Fix catalogd restart brings stale metadata

ImpaladCatalog#updateCatalog() doesn't trigger a full topic update
request when detecting catalogServiceId changes. It just updates the
local catalogServiceId and throws an exception to abort applying the
DDL/DML results. This causes a problem when catalogd is restarted and
the DDL/DML is executed on the restarted instance. In this case, only
the local catalogServiceId is updated to the latest. The local catalog
remains stale. Then when dealing with the following updates from
statestore, the catalogServiceId always matches, so updates will be
applied without exceptions. However, the catalog objects usually won't
be updated since they have higher versions (from the old catalogd
instance) than those in the update. This brings the local catalog out
of sync until the catalog version of the new catalogd grows larger
enough.

Note that in dealing with the catalog updates from statestore, if the
catalogServiceId unmatches, impalad will request a full topic update.
See more in ImpalaServer::CatalogUpdateCallback().

This patch fixes this issue by checking the catalogServiceId before
invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match
the one in the DDL/DML result, wait until it changes. The following
update from statestore will change it and unblocks the DDL/DML thread.

Testing

add several tests in
tests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 150 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/25
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 25
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-20 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 24:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@9
PS23, Line 9: The change of catalogServiceId did not trigger a full topic update
: request.
:
: ImpaladCatalog#updateCatalog() doesn't trigger a full topic update
: request when detecting catalogServiceId changes. It just updates 
the
: local catalogServiceId and throws an exception to abort applying 
the
: DDL/DML results. This causes a problem when catalogd is restarted 
and
: the DDL/DML is executed on the restarted instance. In this case, 
only
: the local catalogServiceId is updated to the latest. The local 
catalog
: remains stale. Then when dealing with the following
> nit: ImpaladCatalog#updateCatalog() doesn't trigger a full topic update req
Done


http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@20
PS23, Line 20: applied without exceptions. However, the catalog objects usually 
won't
 : be updated since they have higher versions (from the old catalogd
 : instance) than those in the update. This brings the local 
catalog out
 : of sync until the catalog version of the new catalogd grows 
larger
 : enough.
> nit: Note that in dealing with the catalog updates from statestore, if the
Done


http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@25
PS23, Line 25:
> Let's add some description about how we fix the bug:
Done


http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@28
PS23, Line 28: more in ImpalaServer::CatalogUpda
> nit: several tests
Done


http://gerrit.cloudera.org:8080/#/c/17645/23//COMMIT_MSG@29
PS23, Line 29:
> nit: tests
Done


http://gerrit.cloudera.org:8080/#/c/17645/23/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/23/be/src/service/impala-server.cc@2146
PS23, Line 2146: // TODO Restart catalogd twice. When restarting for the 
second time, ddl executes
   : // successfully, but we may receive the service id of the 
first restart, which will
   : // cause the local cache to not be updated in time
> nit: could you move these into the else-block at line 2164 and change them
Done


http://gerrit.cloudera.org:8080/#/c/17645/23/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/23/tests/custom_cluster/test_restart_services.py@230
PS23, Line 230: execute_query(query
> Currently execute_query() and execute_query_async() are the same for DDLs.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 24
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 20 Aug 2021 08:50:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-20 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#24).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

ImpaladCatalog#updateCatalog() doesn't trigger a full topic update
request when detecting catalogServiceId changes. It just updates the
local catalogServiceId and throws an exception to abort applying the
DDL/DML results. This causes a problem when catalogd is restarted and
the DDL/DML is executed on the restarted instance. In this case, only
the local catalogServiceId is updated to the latest. The local catalog
remains stale. Then when dealing with the following updates from
statestore, the catalogServiceId always matches, so updates will be
applied without exceptions. However, the catalog objects usually won't
be updated since they have higher versions (from the old catalogd
instance) than those in the update. This brings the local catalog out
of sync until the catalog version of the new catalogd grows larger
enough.

Note that in dealing with the catalog updates from statestore, if the
catalogServiceId unmatches, impalad will request a full topic update.
See more in ImpalaServer::CatalogUpdateCallback().

This patch fixes this issue by checking the catalogServiceId before
invoking UpdateCatalogCache() of FE. If catalogServiceId doesn't match
the one in the DDL/DML result, wait until it changes. The following
update from statestore will change it and unblocks the DDL/DML thread.

Testing

add several tests in
tests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 152 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/24
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 24
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#23).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 144 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/23
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 23
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#22).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 144 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/22
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 22
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#21).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 143 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/21
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 21
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239
PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2)
> While thinking about comments for this sleep, I realize this actually revea
Done



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 20
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 19 Aug 2021 09:23:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#20).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 186 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/20
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 20
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-19 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#19).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 186 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/19
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 19
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-15 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/16/be/src/service/impala-server.cc@2129
PS16, Line 2129: TUniqueId cur_service_id;
   : {
   :   unique_lock ver_lock(catalog_version_lock_);
   :   cur_service_id = catalog_update_info_.catalog_service_id;
   :   if (catalog_update_info_.catalog_service_id != 
catalog_service_id) {
   : LOG(INFO) << "Catalog service ID mismatch. Current ID: 
"
   : << PrintId(cur_service_id) << ". ID in response: "
   : << PrintId(catalog_service_id) << ". Catalogd may 
be restarted. Waitin
> nit: Let's move these into the if-clause at line 2155.
Done


http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@217
PS16, Line 217: 10
> nit: Can we lower this down to 5?
Done. down to 10


http://gerrit.cloudera.org:8080/#/c/17645/16/tests/custom_cluster/test_restart_services.py@239
PS16, Line 239: sleep(self.UPDATE_FREQUENCY_S * 2)
> While thinking about comments for this sleep, I realize this actually revea
If the service id is monotonically increasing, then this problem is better 
solved. But the service id is not, maybe we can treat it as a todo



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 16 Aug 2021 03:19:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-15 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#17).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 146 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/17
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 17
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-06 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@208
PS10, Line 208: LOG.info(str(e))
  :   if i == 5:
  : self.cluster.catalogd.restart()
  :
> statestored will send topic update request to catalogd once it's registered
Done



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 06 Aug 2021 10:56:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-06 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#16).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 139 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/16
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 16
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-08-06 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#15).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 138 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/15
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 15
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-26 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2150
PS10, Line 2150: g_version_update
> nit: add a space after "." and ":"
Done


http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2167
PS10, Line 2167:
> nit: add a space after each ":"
Done


http://gerrit.cloudera.org:8080/#/c/17645/10/be/src/service/impala-server.cc@2168
PS10, Line 2168: ocess the catalog update that covers the effects
   : // (catalog objects) of this o
> nit: we can remove this if it's logging the same id.
Done


http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@186
PS10, Line 186: self.execute_query_expect_success(self.client,
  : "alter table join_aa add columns (name string)")
> Could you add another test with this being executed with sync_ddl=true? Jus
Done


http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@208
PS10, Line 208: LOG.info(str(e))
  :   if i == 5:
  : self.cluster.catalogd.restart()
  :
> I ran this test several times but I'm not able to see the WARNING log of "I
I tried many times, but the WARNING  log was not printed. It may be because of 
the following reason: after catalogd restarts successfully for the second time, 
catalogd will not immediately notify statestored to clear catalog topic 
entries. After waiting for "statestore_update_frequency_ms" milliseconds, 
catalogd receives the update request from statestored, and then catalogd 
notifies statestored to clear the catalog topic entries. Therefore, impalad has 
enough time to complete the first full update, which is triggered by the first 
restart of catalogd.


http://gerrit.cloudera.org:8080/#/c/17645/10/tests/custom_cluster/test_restart_services.py@223
PS10, Line 223: Make the catalog object ve
> nit: rename to "test_restart_catalogd_with_local_catalog"
Done



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 26 Jul 2021 06:05:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-26 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#13).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/13
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 13
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-25 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#12).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/12
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 12
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-25 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#11).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 134 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/11
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 11
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-22 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@a2142
PS7, Line 2142:
> Can we keep this? If we wait for the next statestore update, this won't cau
Done


http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2144
PS7, Line 2144: TUniqueId cur_service_id;
> Could you add a log before the while-loop? E.g.
Done


http://gerrit.cloudera.org:8080/#/c/17645/7/be/src/service/impala-server.cc@2150
PS7, Line 2150: << ".ID in response:" << catalog_service_id << ". 
Catalogd may be restarted."
> I think this captures the case that catalogd restart again and we get anoth
The previous catalog service ID was not recorded, so I printed expected catalog 
service ID


http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/7/tests/custom_cluster/test_restart_services.py@169
PS7, Line 169:   def test_restart_catalogd(self):
> This is already a good test case. Can we add another one to cover the case
Done



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 22 Jul 2021 10:11:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-22 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#9).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 109 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/9
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 9
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-22 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#8).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 109 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/8
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 8
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 7:

16:50:03 ERROR: cannot verify jenkins.impala.io's certificate, issued by ‘CN=Go 
Daddy Secure Certificate Authority - 
G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\\, 
Inc.,L=Scottsdale,ST=Arizona,C=US’:
16:50:03   Issued certificate has expired.
16:50:03 To connect to jenkins.impala.io insecurely, use 
`--no-check-certificate'.
16:50:03 + ssh -p 29418 impala-public-jenk...@gerrit.cloudera.org gerrit review 
--verified -1 --message '"Build' failed: 
'https://jenkins.impala.io/job/gerrit-verify-dryrun/7319/;' --project 
Impala-ASF 17645,7
16:50:05 + exit 1
16:50:05 Build step 'Execute shell' marked build as failure


--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 21 Jul 2021 09:59:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-16 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#6).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 44 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/6
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-15 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#5).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 44 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/5
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-15 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py@189
PS4, Line 189: # Wait enough time to synchronize the metadata
 : sleep(20)
> > We should not depend on sleep. This is what I mean in the previous
I prefer Solution 2, this kind of modification is smaller and safer



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 16 Jul 2021 03:36:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-15 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py
File tests/custom_cluster/test_restart_services.py:

http://gerrit.cloudera.org:8080/#/c/17645/4/tests/custom_cluster/test_restart_services.py@189
PS4, Line 189: # Wait enough time to synchronize the metadata
 : sleep(20)
> We should not depend on sleep. This is what I mean in the previous
 > comment: https://gerrit.cloudera.org/c/17645/1//COMMIT_MSG#17
 >
 > When AlterTable finishes, the metadata of this table should be
 > up-to-date. The current patch can't satify this.

I think of three solutions:
Solution 1. Directly trigger the full topic update, but you have to wait until 
the full update is completed before you can update the local cache, because the 
local cache has a higher version. But I did not find the api to trigger the 
full update, ImpalaServer::CatalogUpdateCallback is not applicable
Solution 2. When restart catalogs, delay DDL execution time. After the ddl is 
executed, determine whether the catalog service id of the returned result is 
the same as the local one. If it is different, wait for the local catalog 
service id to change, and then return success
Solution 3. When call exec_env_->frontend()->UpdateCatalogCache, if the catalog 
server id is different, the catalog object will be updated directly without 
comparing the versions



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 16 Jul 2021 03:20:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-13 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#4).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Testing

add TestRestart#test_restart_catalogd in
ests/custom_cluster/test_restart_services.py

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 29 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/4
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-13 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 3:

add unit test


--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 13 Jul 2021 11:52:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-13 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#3).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
M tests/custom_cluster/test_restart_services.py
2 files changed, 29 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/3
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-12 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG@16
PS1, Line 16: but did not submit full topic update
: request
> Shouldn't we fix this instead? The new catalog objects in the DDL
 > response come from the new catalogd instance. Will they be able to
 > update the local catalog cache? Or are they just got ignored due to
 > versions too small?

we should fix this bug.The new catalog objects in the DDL
response come from the new catalogd instance, they just got ignored due to 
versions too small

 >
 > I think updating the local catalog cache immediately using the DDL
 > response aims to let the coordinator get latest catalog
 > immediately. After this patch, the coordinator will need to wait
 > for the next statestore topic update to get the latest catalog,
 > which doesn't match the existing functionality. Did I miss
 > anything?

This patch do not change the previous process. After the ddl is successfully 
executed, impalad will immediately modify the local catalog cache instead of 
waiting for the statestore notification.
When the ddl is executed successfully, it is reasonable for impalad to modify 
the local cache, but it is redundant to modify the service id every time, and 
it will cause the object version to be very high, causing the local catalog 
cache to be out of sync


http://gerrit.cloudera.org:8080/#/c/17645/1//COMMIT_MSG@24
PS1, Line 24: CatalogD
> CatalogD?
Done


http://gerrit.cloudera.org:8080/#/c/17645/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17645/1/be/src/service/impala-server.cc@a2133
PS1, Line 2133:
  :
  :
  :
> Is a better solution to set the catalogServiceId_ only if firstRun is true
Modifying the server id here may be redundant. Is it necessary to do so?



--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 13 Jul 2021 03:30:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-12 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Thomas Tauber-Marshall, Vihang Karajgaonkar, 
Zoltan Borok-Nagy, Tim Armstrong, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17645

to look at the new patch set (#2).

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of CatalogD is greater than Impalad.

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
1 file changed, 1 insertion(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/2
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-07-12 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17637 )

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17637/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17637/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@288
PS6, Line 288:  selectivity_ = selectivity_ * (double) (numRows - numNulls) / 
numRows +
 :   numNulls / numRows;
> nit. In this case, can we use selectivity = #nulls / #rows directly?
This calculation formula may be (1- 1/ndv)* (numRows - numNulls) / numRows + 
numNulls / numRows. Some non-null values satisfy "is distinct from not-null". 
ie. 'aa' is distinct from 'bb'



--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 13 Jul 2021 02:36:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-07-12 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17637 )

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@250
PS5, Line 250: rChildIsNull
> nit. Do we need to worry about left child is null here, or not since the le
Considering this special situation, the selectivity will be more accurate.


http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@280
PS5, Line 280: // For =, !=, "is distinct from null" and "is not distinct from 
non-null",
 :   // all null values are false.
 :   selectivity_ *= (double) (numRows - numNulls) / 
numRows;
> nit. may combine them into a single sentence:
Done


http://gerrit.cloudera.org:8080/#/c/17637/5/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@284
PS5, Line 284: stinct from null, only null values are
> Another case here is "is distinct from not-null".
Done



--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 12 Jul 2021 08:40:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-07-12 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17637

to look at the new patch set (#6).

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Testing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
32 files changed, 1,203 insertions(+), 1,186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/6
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-07-09 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17637 )

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17637/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17637/4//COMMIT_MSG@20
PS4, Line 20: Testin
> Testing
Done


http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@253
PS4, Line 253: !
> !=
Done


http://gerrit.cloudera.org:8080/#/c/17637/4/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@283
PS4, Line 283:  (numRows - numNulls) / numRows;
> nit: I think it's worth to extract it to a variable just like numRows
Done



--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 09 Jul 2021 09:52:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-07-09 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17637

to look at the new patch set (#5).

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Testing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
32 files changed, 1,198 insertions(+), 1,185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/5
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-06 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Patch Set 1:

When updating the local cache, catalogServiceId should not be changed here, 
because the full update will not be triggered here


--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 06 Jul 2021 11:24:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-04 Thread liuyao (Code Review)
liuyao has removed Tim Armstrong from this change.  ( 
http://gerrit.cloudera.org:8080/17645 )

Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..


Removed reviewer Tim Armstrong.
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

2021-07-03 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17645


Change subject: IMPALA-5476: Fix Catalogd restart bring about metadata is out 
of sync
..

IMPALA-5476: Fix Catalogd restart bring about metadata is out of sync

The change of catalogServiceId did not trigger a full topic update
request.

During the execution of DDL or DML, restarting Catalogd will cause
catalogServiceId to change. The DDL will execute successfully, and
then directly update the local cache (not through Statestored), modify
the local catalog with the new catalogServiceId, trigger an exception,
print the exception information, but did not submit full topic update
request. The version of Catalogd is lower than Impalad, so the metadata
information synchronized from Catalogd will be lost.

Under normal circumstances, if the catalogServiceId changes, Impalad
will submit full topic update request, Impalad will receive
TUpdateCatalogCacheRequest, and is_delta is false, that is full update.
Impalad will reset the local catalog cache, and then update, and the
version of DatalogD is greater than Impalad.

Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
---
M be/src/service/impala-server.cc
1 file changed, 1 insertion(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17645/1
--
To view, visit http://gerrit.cloudera.org:8080/17645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9fe25f5a2a42fb432e306ef08ae35750c8f3c50c
Gerrit-Change-Number: 17645
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-06-28 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17637

to look at the new patch set (#4).

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Tesing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
32 files changed, 1,198 insertions(+), 1,185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/4
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-06-28 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17637

to look at the new patch set (#3).

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Tesing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
32 files changed, 1,188 insertions(+), 1,173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/3
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-06-25 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17637

to look at the new patch set (#2).

Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Tesing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
6 files changed, 70 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/2
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10766: Better selectivity for =,not distinct

2021-06-25 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17637


Change subject: IMPALA-10766: Better selectivity for =,not distinct
..

IMPALA-10766: Better selectivity for =,not distinct

For = :
If the right side is null, then selectivity is 0.
If the left side is null, null should be excluded when calculating
selectivity.

For is not distinct from :
If the right side is null, non null should be excluded when calculating
selectivity, and only null should be included.
If the left side is null and the right side is not null, null should be
excluded when calculating selectivity, including part of non-null.

Tesing :
Change the UT, modify the selectivity calculation error, add two new
cases column != null and column = null

Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-inner-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
6 files changed, 70 insertions(+), 55 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/17637/1
--
To view, visit http://gerrit.cloudera.org:8080/17637
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib8ec62f2355a7036125cc0d261b790644b9f4b60
Gerrit-Change-Number: 17637
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-24 Thread liuyao (Code Review)
Hello Quanlong Huang, Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#10).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

There is a read-write contention in FunctionName#analyze()
when a FunctionName object is shared to two threads: check if there
is a builtin function in wrong database, when 'db_' is temporarily
modified by the other thread.

This patch fix this issue. For analyze FunctionName, if no database is
specified, check builtinDb first; if FunctionName is not in builtinDb,
set the database to defaultDb.Avoid setting the database of
FunctionName to a temporary value

Detailed description:

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be
set.But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defaultDb, and then
user B will continue to check whether the function is defaultDb,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/10
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 10
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-24 Thread liuyao (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#8).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

There is a read-write contention in FunctionName#analyze()
when a FunctionName object is shared to two threads: check if there
is a builtin function in wrong database, when 'db_' is temporarily
modified by the other thread.

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defaultDb, and then
user B will continue to check whether the function is defaultDb,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/8
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 8
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-24 Thread liuyao (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#7).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

There is a read-write contention in FunctionName#analyze()
when a FunctionName object is shared to two threads: check if there
is a builtin function in wrong database, when 'db_' is temporarily
modified by the other thread.

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defaultDb, and then
user B will continue to check whether the function is defaultDb,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
2 files changed, 3 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/7
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-23 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17604 )

Change subject: IMPALA-9763: builtin functions throw unknown exception
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17604/6/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
File fe/src/main/java/org/apache/impala/analysis/FunctionName.java:

http://gerrit.cloudera.org:8080/#/c/17604/6/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@144
PS6, Line 144: if (!isFullyQualified()) {
 :   db_ = analyzer.getDefaultDb();
 :   if (preferBuiltinsDb && builtinDb.containsFunction(fn_)) {
 : db_ = BuiltinsDb.NAME;
 :   }
 : }
 : Preconditions.checkNotNull(db_);
 : isBuiltin_ = db_.equals(BuiltinsDb.NAME) &&
 : builtinDb.containsFunction(fn_);
> I think a simpler solution is resolving the read-write contention here, i.e
This is indeed a simple and conservative method. However, the risk of shallow 
copy still exists, and this can be resolved later. If the database is not 
specified, check builtDb first, if not, set it to defaultDb. I'll modify it 
like this first, and then test it?



--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 24 Jun 2021 02:50:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-23 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17604 )

Change subject: IMPALA-9763: builtin functions throw unknown exception
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17604/5/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/17604/5/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@274
PS5, Line 274:   orderByElement.getExpr().collectAll(
 :   Predicates.instanceOf(FunctionCallExpr.class), 
fnExprs);
 : }
 : substituteOrdinalsAndAliases(orderingExprs, "ORDER BY", 
analyzer);
 :
 : for (FunctionCallExpr fn: fnExprs) 
fn.getFnName().analyze(analyzer);
> I'm not clear about this. You are collecting fnExprs from the
 > original 'orderByElements_' but not the cloned ones inside
 > 'orderingExprs'. And then analyze these original exprs.
 >
 > Do we require all FunctionName objects being analyzed now? Just
 > concerning that it will introduce a new requirement that may be
 > broken in many other places.

Previously, the code here relied on shallow copy. When executing 
substituteOrdinalsAndAliases(orderingExprs, "ORDER BY", analyzer), the 
FunctionName in orderingExprs was analyzed and the FunctionName in 
orderByElements_ was also analyzed.
After changing to a deep copy, if the FunctionName in orderByElements_ is not 
analyzed here, then NormalizeCountStarRule#apply will report an exception: 
NullPointorException, because the FunctionName in orderByElements_ has not been 
analyzed, origExpr.getFnName().getFunction() returns Is null(line 41).



--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 23 Jun 2021 09:37:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-22 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17604 )

Change subject: IMPALA-9763: builtin functions throw unknown exception
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17604/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/17604/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@173
PS4, Line 173: Preconditions.checkState(other.fnName_ != null);
> nit: Could you add a Preconditions check that other.fnName_ != null?
Done


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
File fe/src/main/java/org/apache/impala/analysis/FunctionName.java:

http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@64
PS3, Line 64: fnNamePath_ = (other.getFnNamePath() != null) ?
> Sorry, I'm still not clear why we need to clone a field that is
 > never modified... Did I miss anything?

Only fnNamePath_ is saved in StmtTableCache, db_, fn_ are not saved, isBuiltin_ 
and isAnalyzed_ are both false


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@66
PS3, Line 66: db_ = other.getDb();
: fn_ = other.getFunction();
: isBuiltin_ = other.isBuiltin();
: isAnalyzed_ = other.isFnAnalyzed();
> > I tried to do this, but an error occurred. Because
 > FunctionName::fnNamePath_ is null.
 > >
 > >use functional;
 > >Select count(1) from alltypesagg;
 > >
 > >I0621 19:03:29.787499 46677 jni-util.cc:286] 
 > >0649db5f6bc675d6:6b92f3b6]
 > java.lang.NullPointerException
 > >at 
 > >org.apache.impala.analysis.FunctionCallExpr.isConstantImpl(FunctionCallExpr.java:372)
 > >at org.apache.impala.analysis.Expr.isConstant(Expr.java:1460)
 > >at org.apache.impala.analysis.Expr.computeNumDistinctValues(Expr.java:575)
 > >at org.apache.impala.analysis.Expr.analyze(Expr.java:500)
 > >..
 >
 > hmm, did you reset fnNamePath_ to null so it cause the
 > NullPointerException? In reset() we only reset the analysis results
 > which are db_, fn_, isBuiltin_ and isAnalyzed_. The fnNamePath_
 > comes from the parser so should not be reset.
 >
 > > I modified the copy constructor of FunctionCallExpr again. I
 > clone QueryStmt in the constructor of InlineViewRef. At this time,
 > db_ and fn_ are both null, and isBuiltin_ and isAnalyzed_ are both
 > false, because only fnNamePath_ is stored in the stmtTableCache,
 > and no other values are stored.
 >
 > OK, this answers my question that cloning the analysis results are
 > safe, because the source objects are in the StmtTableCache and they
 > are never analyzed. All code paths go here will see 'other' has
 > null db_ , fn_ etc.
 > However, this depends on the caller implementation far away from
 > this class. I think adding a reset() method to reset db_, fn_,
 > isBuiltin_ and isAnalyzed_ help to prevent future bugs.

I added the reset() function, but I didn’t call it in 
FunctionCallExpr#resetAnalysisState for the following reasons:
1. FunctionName is created in some places, only fn_ is set, fnNamePath_ is not 
set, if reset is called, both fn_ and fnNamePath_ are null, for example, 
AggregateInfo#createCountDistinctAggExprParam,NormalizeCountStarRule#apply
2. In some places, fnNamePath_ is set, but if reset is called, FunctionName 
will continue to be used without analysis.



--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 22 Jun 2021 07:44:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-22 Thread liuyao (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#5).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

There is a read-write contention in FunctionName#analyze()
when a FunctionName object is shared to two threads: check if there
is a builtin function in wrong database, when 'db_' is temporarily
modified by the other thread.

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defaultDb, and then
user B will continue to check whether the function is defaultDb,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
3 files changed, 28 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/5
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal

2021-06-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-7560: Set selectivity of Not-equal
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@256
PS6, Line 256: selectivity_ = 1.0 / distinctValues;
> I think if we have null stats, the selectivity for EQ case should be comput
Maybe I submit another patch to modify the selectivity of EQ to be better. This 
patch only modifies the selectivity of NE


http://gerrit.cloudera.org:8080/#/c/17344/6/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@258
PS6, Line 258: //
> nit. May add a comment here: For case  IS DISTINCT FROM NULL
Done



--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Tue, 22 Jun 2021 02:44:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal

2021-06-21 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#7).

Change subject: IMPALA-7560: Set selectivity of Not-equal
..

IMPALA-7560: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col != 5", but not "2 * col != 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 110 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/7
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-21 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17604 )

Change subject: IMPALA-9763: builtin functions throw unknown exception
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@27
PS3, Line 27: me time.
> nit: defaultDb
Done


http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@28
PS3, Line 28: getDefault
> builtin?
Done


http://gerrit.cloudera.org:8080/#/c/17604/3//COMMIT_MSG@30
PS3, Line 30: FunctionCallExpr::analyzeImpl, and just completed 
FunctionName::analyze;
> I think you can add a summary that there is a read-write contention in Func
Done


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@173
PS3, Line 173: Name_.clone();
> nit: can we change this to isFnAnalyzed()? It's more readable for me.
Yes, the readability here is not good. But I think it might be better to drop 
the shallow copy and modify the code that relies on the shallow copy.


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java
File fe/src/main/java/org/apache/impala/analysis/FunctionName.java:

http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@64
PS3, Line 64: fnNamePath_ = (other.getFnNamePath() != null) ?
> I think we don't need to clone this since it's private and never modified.
I removed the "shallow copy" in the copy constructor of FunctionCallExpr, so I 
need to copy fnNamePath_


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@66
PS3, Line 66: db_ = other.getDb();
: fn_ = other.getFunction();
: isBuiltin_ = other.isBuiltin();
: isAnalyzed_ = other.isFnAnalyzed();
> Is it safe to clone the analysis results? E.g. if two dbs both have a foo()
I tried to do this, but an error occurred. Because FunctionName::fnNamePath_ is 
null.

use functional;
Select count(1) from alltypesagg;

I0621 19:03:29.787499 46677 jni-util.cc:286] 0649db5f6bc675d6:6b92f3b6] 
java.lang.NullPointerException
at 
org.apache.impala.analysis.FunctionCallExpr.isConstantImpl(FunctionCallExpr.java:372)
at org.apache.impala.analysis.Expr.isConstant(Expr.java:1460)
at org.apache.impala.analysis.Expr.computeNumDistinctValues(Expr.java:575)
at org.apache.impala.analysis.Expr.analyze(Expr.java:500)
at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyzeSelectClause(SelectStmt.java:341)
at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:280)
at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:265)
at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:258)
at 
org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:575)
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:549)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:445)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1669)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1635)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1605)
at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:163)


I modified the copy constructor of FunctionCallExpr again. I clone QueryStmt in 
the constructor of InlineViewRef. At this time, db_ and fn_ are both null, and 
isBuiltin_ and isAnalyzed_ are both false, because only fnNamePath_ is stored 
in the stmtTableCache, and no other values are stored.


http://gerrit.cloudera.org:8080/#/c/17604/3/fe/src/main/java/org/apache/impala/analysis/FunctionName.java@72
PS3, Line 72:   @Override
> nit: Could you add an "@Override" annotation?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Mon, 21 Jun 2021 11:53:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-21 Thread liuyao (Code Review)
Hello Quanlong Huang, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#4).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

There is a read-write contention in FunctionName#analyze()
when a FunctionName object is shared to two threads: check if there
is a builtin function in wrong database, when 'db_' is temporarily
modified by the other thread.

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl, and just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defaultDb, and then
user B will continue to check whether the function is defaultDb,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
3 files changed, 20 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/4
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-19 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#3).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defultdb, and then
user B will continue to check whether the function is pure time ,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/3
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-19 Thread liuyao (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17604

to look at the new patch set (#2).

Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defultdb, and then
user B will continue to check whether the function is pure time ,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
2 files changed, 14 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/2
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9763: builtin functions throw unknown exception

2021-06-19 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17604


Change subject: IMPALA-9763: builtin functions throw unknown exception
..

IMPALA-9763: builtin functions throw unknown exception

This problem is caused by shallow copies. In the function
FromClause::analyze(Analyzer analyzer), if the table is FeView,
the analyzer will clone SelectStmt from the view's MetaData,
which should be a deep copy. But in the copy constructor of
FunctionCallExpr, fnName_ = other.fnName_;, this is a shallow copy,
which makes the fnName_ of multiple queries refer to the same block of
memory. When the query is analyzed, the database of fnName_ will be set.
But this will change the fnName_ of other queries.

For example:

create view1 as select coalesce(col1, 2) from table1;

User A and User B check the same view1 at the same time. When
user A’s query is analyzed, it runs to db_ = analyzer.getDefaultDb() of
FunctionName::analyze. User B’s query is also analyzed and runs to
FunctionCallExpr::analyzeImpl. And just completed FunctionName::analyze;
at this time, the step of user A's db_ = analyzer.getDefaultDb() will
change the db_ attribute of user B's fnName_ to defultdb, and then
user B will continue to check whether the function is pure time ,
Throw an exception: coalesce() unknown for database ...

Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
---
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/17604/1
--
To view, visit http://gerrit.cloudera.org:8080/17604
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifd77f5a7cd9a674340770233fe26eb5dc26e666a
Gerrit-Change-Number: 17604
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal

2021-06-18 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-7560: Set selectivity of Not-equal
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG@7
PS4, Line 7: IMPALA-7560:
> You might want to use IMPALA-7560 as this was the original Jira and it cont
Done


http://gerrit.cloudera.org:8080/#/c/17344/4//COMMIT_MSG@11
PS4, Line 11: col != 5", but not "2 * col !=
> nit: wouldn't be better to use != in the example?
Done


http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257
PS3, Line 257:  || op_ == Ope
> SlotRef retrieves the NDV from the SlotDescriptor's ColumnStats in SlotRef.
Done


http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@a372
PS4, Line 372:
> That seems wrong looking at the data. I think it should be 1.
Done


http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@344
PS4, Line 344:
> NULL values are not included in the NDV, also null count is stored separate
Done


http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@355
PS4, Line 355: 0);
> I think the expected value is 0 here, as null_str is all nulls.
Done


http://gerrit.cloudera.org:8080/#/c/17344/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@359
PS4, Line 359: some_nulls is not distinct from 'foo'",
> I think it'd be worth to add a test case with
Done



--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 18 Jun 2021 07:30:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal

2021-06-18 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#6).

Change subject: IMPALA-7560: Set selectivity of Not-equal
..

IMPALA-7560: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col != 5", but not "2 * col != 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 109 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/6
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-7560: Set selectivity of Not-equal

2021-06-18 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#5).

Change subject: IMPALA-7560: Set selectivity of Not-equal
..

IMPALA-7560: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col != 5", but not "2 * col != 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 108 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/5
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-06-03 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-10677: Set selectivity of Not-equal
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257
PS3, Line 257: distinctValues
> I wonder if distinctValues includes the null. If so, we may need to compute
If the column contains null, but not all null, then distinctValues does not 
contain null



--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 04 Jun 2021 03:40:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-06-03 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-10677: Set selectivity of Not-equal
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@257
PS3, Line 257: distinctValues
> I wonder if distinctValues includes the null. If so, we may need to compute
If all values of the column are null, the distinctValues is 1; if not, the 
distinctValues does not includes null


http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/17344/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@70
PS3, Line 70:  verifyCardinality(
: "SELECT id FROM functional.alltypes WHE
> Should we remove this comment, since the calculation is correct with the pa
Done


http://gerrit.cloudera.org:8080/#/c/17344/3/testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
File testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test:

http://gerrit.cloudera.org:8080/#/c/17344/3/testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test@160
PS3, Line 160: from functional.alltypestiny
> Remove?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Fri, 04 Jun 2021 03:37:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-06-03 Thread liuyao (Code Review)
Hello Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#4).

Change subject: IMPALA-10677: Set selectivity of Not-equal
..

IMPALA-10677: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col = 5", but not "2 * col = 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 91 insertions(+), 97 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/4
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10696: fix accuracy problem

2021-05-07 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17411


Change subject: IMPALA-10696: fix accuracy problem
..

IMPALA-10696: fix accuracy problem

Table alltypes has no statistics, so the cardinality of alltypes
will be estimated based on the hdfs files and the avg row size.
Calling PrintUtils.printMetric, double will be divided by long. There
will be accuracy problems. In most cases, the number of lines
calculated is 17.91 K. But due to accuracy problems here, the
calculated value is 17.90K.

I modified line 221 of stats-extrapolation.test and used row_regex
to match, referring to the matching method of cardinality in line
224,in this case, their values are the same

Testing:
metadata/test_stats_extrapolation.py

Change-Id: I0a1a3809508c90217517705b2b188b2ccba6f23f
---
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17411/1
--
To view, visit http://gerrit.cloudera.org:8080/17411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a1a3809508c90217517705b2b188b2ccba6f23f
Gerrit-Change-Number: 17411
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-29 Thread liuyao (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#3).

Change subject: IMPALA-10677: Set selectivity of Not-equal
..

IMPALA-10677: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col = 5", but not "2 * col = 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 91 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/3
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 7:

compare Patch Set 4 and Patch Set 7


--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 29 Apr 2021 04:36:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@461
PS4, Line 461:   // incremental statistics size = Existing partition 
statistics
 :   // - Repeated calculation partition stats
 :   // + This time calculation partition stats
 :   for (FeFsPartition part: allPartitions) {
 : /
> In the commit message you have the following formula:
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@467
PS4, Line 467:   // - Repeated calculation partition stats"
> Please add comment that this is the "This time calculation partition stats"
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@471
PS4, Line 471:
> nit: stats
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1787
PS4, Line 1787: checkComputeStatsStmt(
  : "compute incremental stats functional.alltypes 
partition(year=2010, month=10)");
  :
> Please add a few more tests where you invoke compute incremental stats for
Done



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 29 Apr 2021 02:23:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-28 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17344 )

Change subject: IMPALA-10677: Set selectivity of Not-equal
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@244
PS1, Line 244:
> This condition can be checked at the beginning together with !isSingleColum
Done


http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@249
PS1, Line 249:  to selectivity is 0.
> Repetition in both then and else branch.
Done


http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/17344/1/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@316
PS1, Line 316: testNeSelectivity
> May add a couple test cases for an empty table.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 28 Apr 2021 11:46:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-28 Thread liuyao (Code Review)
Hello Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17344

to look at the new patch set (#2).

Change subject: IMPALA-10677: Set selectivity of Not-equal
..

IMPALA-10677: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col = 5", but not "2 * col = 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
12 files changed, 89 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/2
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#7).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 66 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 7
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@461
PS4, Line 461:   // incremental statistics size = Existing partition 
statistics
 :   // - Repeated calculation partition stats
 :   // + This time calculation partition stats
 :   for (FeFsPartition part: allPartitions) {
 : /
> In the commit message you have the following formula:
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@467
PS4, Line 467:   // - Repeated calculation partition stats"
> Please add comment that this is the "This time calculation partition stats"
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@471
PS4, Line 471:
> nit: stats
Done


http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17299/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1787
PS4, Line 1787: checkComputeStatsStmt(
  : "compute incremental stats functional.alltypes 
partition(year=2010, month=10)");
  :
> Please add a few more tests where you invoke compute incremental stats for
Done



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 28 Apr 2021 06:39:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#6).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 65 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 6
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-28 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Zoltan Borok-Nagy, Tim Armstrong, Bikramjeet Vig, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#5).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 66 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 5
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10677: Set selectivity of Not-equal

2021-04-26 Thread liuyao (Code Review)
liuyao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17344


Change subject: IMPALA-10677: Set selectivity of Not-equal
..

IMPALA-10677: Set selectivity of Not-equal

Calculate binary predicate selectivity if one of the children is
a slotref and the other children are all constant.
eg. something like "col = 5", but not "2 * col = 10"

selectivity = 1 - 1/ndv

Testing:
Modify the function testNeSelectivity() of the
ExprCardinalityTest.java, change -1 to the correct value.

Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
---
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/card-scan.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
12 files changed, 60 insertions(+), 57 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17344/1
--
To view, visit http://gerrit.cloudera.org:8080/17344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd6f5945840ea2a8194d72aa440ddfa6915cbb3a
Gerrit-Change-Number: 17344
Gerrit-PatchSet: 1
Gerrit-Owner: liuyao 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/17299/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1784
PS3, Line 1784: fu
> nit. table alltypes?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:27:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#4).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/4
--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 4
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
liuyao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17299 )

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@448
PS2, Line 448: numOfAllIncStats
> May rename the variable to numOfAllIncStatsPartitions
Done


http://gerrit.cloudera.org:8080/#/c/17299/2/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS2, Line 452: if (partitionSet_ == null) {
 :   numOfAllIncStatsPartitions = allPartitio
> We may not need to verify the size limit when the partition set for increme
It is a pre-check on the size of incremental stats to prevent the incremental 
stats from occupying too much memory after calculation. If no partition is 
specified, all partitions are calculated. Need to check whether the incremental 
stats of all partitions exceeds the threshold after calculation.



--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: liuyao 
Gerrit-Comment-Date: Wed, 14 Apr 2021 06:18:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-14 Thread liuyao (Code Review)
Hello Aman Sinha, Qifan Chen, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#3).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/3
--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 3
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10652: Optimize the checking of the size of incremental stats

2021-04-10 Thread liuyao (Code Review)
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17299

to look at the new patch set (#2).

Change subject: IMPALA-10652: Optimize the checking of the size of incremental 
stats
..

IMPALA-10652: Optimize the checking of the size of incremental stats

Modify the estimation method of incremental statistics size:
incremental statistics size = Existing partition statistics
+ This time calculation partition stats
- Repeated calculation partition stats

Testing:
All partitions of a table have no incremental stats.
--Calculate the incremental stats of all partitions,
  the incremental stats size exceeds the threshold,
  an error is reported.
--Calculate the incremental stats of one partition,
  no error is reported.

Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
2 files changed, 45 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17299/2
--
To view, visit http://gerrit.cloudera.org:8080/17299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f35ea936445015a3b8b8102b1891db29751b5ee
Gerrit-Change-Number: 17299
Gerrit-PatchSet: 2
Gerrit-Owner: liuyao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


  1   2   >