from:"Qifan Chen \(Code Review\)"

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-09-07 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..

Patch Set 9: Code-Review+1

(4 comments)

Looks great.

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/9//COMMIT_MSG@21
PS9, Line 21: This fix restores limits on the aggregation node;
nit. This fix properly configures the aggregate node to avoid early completion,
via the existent flag 'fast_limit_check', when num_nodes=1. In a true
distributed plan, the fix does not alter the existing semantics.

http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a196
PS9, Line 196:
nit. noEarlyCompletion better describes the semantics.

http://gerrit.cloudera.org:8080/#/c/20379/9/be/src/exec/grouping-aggregator.h@a218
PS9, Line 218:
nit. Can we use setNoEarlyCompletion() instead?

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786:
> Joe asked why https://github.com/apache/impala/blob/master/be/src/exec/grou
Good finding Joe. I love the reworked fix Michael.

--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell
Gerrit-Reviewer: Michael Smith
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Quanlong Huang
Gerrit-Comment-Date: Thu, 07 Sep 2023 13:24:07 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-28 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786:
> What's is_serial_plan? I don't understand what case that's supposed to cove
The original version of this method deals with only logical properties. The 
modified version adds physical properties, in particular the inclusion of the 
only distributed/parallel version of the aggregate operator.  As a result, the 
the serial plan version is excluded.

If we can test the serial plan version here, then we will not exclude a piece 
of functionality existed before.



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 28 Aug 2023 13:06:22 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(2 comments)

Looks good and I think we are converge toward +2.

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@777
PS4, Line 777: return useIntermediateTuple_ || endsMultiPhase_;
nit. May add a comment:  useIntermediateTuple_ is set to true for any non merge 
nodes and endsMultiPhase_ is true for a merge node.


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> Unless I misunderstood your example, we can't sort on partition keys
Okay. The test on useIntermediateTuple_ resolves my concern :-).  I think I am 
okay with the change for the distributed plans.

In a serial plan, per 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L475,
 the aggregation node may lost the capability to bail out early. Can we mend it 
somehow as follows?

return isSingleClassAgg() && hasLimit() && hasGrouping() && (is_serial_plan || 
isMultiPhase())
   && !multiAggInfo_.hasAggregateExprs() && getConjuncts().isEmpty();



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Aug 2023 00:35:10 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-24 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
> isMultiPhase encompasses all nodes that are part of a chain of aggregation
I see.

Can you perform a simple performance test to see if this would negatively 
affect queries that a very small subset of non-merge aggregate nodes can 
provide the answer?

For example, let us partition table T on column a, b into 10 partitions and 
sorted on a, b. The query is
select distinct a, b from T limit 2.

Normally, such query can finish as soon as two smallest subsets of rows (on a, 
b) are read in.

By reading the code here, my understand is that with the change we can not 
complete early until on all read nodes (from 10 partitions) are done the work 
and we can complete early only at the very top merge node is active. True?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 24 Aug 2023 20:52:01 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-22 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 4:

(4 comments)

The change looks great.

Just have one question on performance.

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: endsMultiPhase_ = f
> Looked into this a bit more. Most of those paths don't use pushdown because
Done


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@83
PS4, Line 83: finalize
nit node?


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@172
PS4, Line 172: Preconditions.checkState(needsFinalize_);
&& !endsMultiPhase_


http://gerrit.cloudera.org:8080/#/c/20379/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@786
PS4, Line 786: && isMultiPhase()
I wonder if this effectively requires all non-root aggregation nodes (i.e., 
their endsMultiPhase_ is false) to complete their work first, prior to the 
merge aggregate node. If so, this may cause performance degradation.

Is it possible to fix this bug in backend by counting number of values x coming 
up at the merge aggregate and bailing out of the loop when x >= n?



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:11:14 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12383: Fix SingleNodePlanner aggregation limits

2023-08-19 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20379 )

Change subject: IMPALA-12383: Fix SingleNodePlanner aggregation limits
..


Patch Set 1:

(4 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@11
PS1, Line 11: For
: example
nit. As a result,


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@12
PS1, Line 12:
would incorrectly return


http://gerrit.cloudera.org:8080/#/c/20379/1//COMMIT_MSG@17
PS1, Line 17: Identifies
nit. This fix identifies


http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/20379/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS1, Line 85: hasStreamingPreagg_
I wonder if this new variable is logically equivalent to the following

needsFinalize_ && useStreamingPreagg_



--
To view, visit http://gerrit.cloudera.org:8080/20379
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5eec1190e8e182152aa954897b79cc3f219c816
Gerrit-Change-Number: 20379
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 19 Aug 2023 16:23:49 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12051: Propagate analytic tuple predicates of outer-joined InlineView

2023-06-02 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19768 )

Change subject: IMPALA-12051: Propagate analytic tuple predicates of 
outer-joined InlineView
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6c209b2a64bad37d893ba8b520342bf1f9a7513
Gerrit-Change-Number: 19768
Gerrit-PatchSet: 7
Gerrit-Owner: Minghui Zhu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Minghui Zhu 
Gerrit-Reviewer: Qifan Chen (767)
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Fri, 02 Jun 2023 19:02:49 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12051: Propagate analytic tuple predicates of outer-joined InlineView

2023-06-01 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19768 )

Change subject: IMPALA-12051: Propagate analytic tuple predicates of 
outer-joined InlineView
..


Patch Set 6: Code-Review+1

(1 comment)

Thanks for the update!

Since I have some issues access my original GitHub account with Impala 
committer privilege, I can only +1 with this temporary account.

http://gerrit.cloudera.org:8080/#/c/19768/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19768/6//COMMIT_MSG@79
PS6, Line 79:
nit. Is it possible to copy a section of the bad plan for the above query to 
show the missing predicates?

Since this bug is quite severe, it is a good idea to show how to spot the bad 
plan in general.



--
To view, visit http://gerrit.cloudera.org:8080/19768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6c209b2a64bad37d893ba8b520342bf1f9a7513
Gerrit-Change-Number: 19768
Gerrit-PatchSet: 6
Gerrit-Owner: Minghui Zhu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Minghui Zhu 
Gerrit-Reviewer: Qifan Chen (767)
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Thu, 01 Jun 2023 13:04:09 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11536: fix invalid predicates propagate for outer join simplification

2023-04-22 Thread Qifan Chen (Code Review)

Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19116 )

Change subject: IMPALA-11536: fix invalid predicates propagate for outer join 
simplification
..

IMPALA-11536: fix invalid predicates propagate for outer join simplification

When set ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION = true, the planner
will simplify outer joins if the WHERE clause contains at least one
null rejecting condition and then remove the outer-joined tuple id
from the map of GlobalState#outerJoinedTupleIds.
However, there may be false removals for right join simplification or
full join simplification. This may lead to incorrect results since it
is incorrect to propagate a non null-rejecting predicate into a plan
subtree that is on the nullable side of an outer join.
GlobalState#outerJoinedTupleIds indicates whether a table is on the
nullable side of an outer join.

E.g.
SELECT COUNT(*)
FROM functional.nullrows t1
  FULL JOIN functional.nullrows t2 ON t1.id = t2.id
  FULL JOIN functional.nullrows t3 ON coalesce(t1.id, t2.id) = t3.id
WHERE t1.group_str = 'a'
  AND coalesce(t2.group_str, 'f') = 'f'
The predicate coalesce(t2.group_str, 'f') = 'f' will propagate into t2
if we remove t2 from GlobalState#outerJoinedTupleIds.

Testing:
- Add new plan tests in outer-to-inner-joins.test
- Add new query tests to verify the correctness on transformation

Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Reviewed-on: http://gerrit.cloudera.org:8080/19116
Reviewed-by: Qifan Chen 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
M 
testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test
3 files changed, 307 insertions(+), 17 deletions(-)

Approvals:
  Qifan Chen: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/19116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Gerrit-Change-Number: 19116
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Xianqing He

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-20 Thread Qifan Chen (Code Review)

Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..

IMPALA-7942 (part 2): Add query hints for predicate selectivities

Currently, Impala only uses simple estimation to compute selectivity.
For some predicates, this may lead to worse query plan due to CBO.

This patch adds a new query hint: 'SELECTIVITY' to help specify a
selectivity value for a predicate.

The parser will interpret expressions wrapped in () followed by a
C-style comment /*  */ as a predicate hint. The
predicate hint currently can be in the form of +SELECTIVITY(f) where
'f' is a positive floating point number, in the range of (0, 1], to
use as the selectivity for the preceding expression.

Single predicate example:

  select col from t where (a=1) /* +SELECTIVITY(0.5) */;

Compound predicate example:

  select col from t where (a=1 or b=2) /* +SELECTIVITY(0.5) */;

As a limitation of this path, the selectivity hints for 'AND' compound
predicates, either in the original SQL query or internally generated,
are ignored. We may supported this in the near future.

Testing:
- Added new fe tests in 'PlannerTest'
- Added new fe tests in 'AnalyzeStmtsTest' for negative cases

Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Reviewed-on: http://gerrit.cloudera.org:8080/18023
Tested-by: Impala Public Jenkins 
Reviewed-by: Qifan Chen 
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M fe/src/main/java/org/apache/impala/analysis/IsNullPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-selectivity-hint.test
10 files changed, 414 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 36
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-20 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 34: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 34
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 20 Apr 2023 10:13:27 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-18 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 33:

(7 comments)

Looks very good!

http://gerrit.cloudera.org:8080/#/c/18023/33//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18023/33//COMMIT_MSG@12
PS33, Line 12: , we can use this
 : hint
nit. remove and replace with "to help specify"


http://gerrit.cloudera.org:8080/#/c/18023/33//COMMIT_MSG@18
PS33, Line 18:
, in the range of [0, 1],


http://gerrit.cloudera.org:8080/#/c/18023/33//COMMIT_MSG@29
PS33, Line 29: But pay attention, selectivity hint is invalid for 'AND' compound
 : predicates and between predicates in this patch
As a limitation of this path, the selectivity hints for 'AND' compound 
predicates, either in the original SQL query or internally generated, are 
ignored.


http://gerrit.cloudera.org:8080/#/c/18023/33//COMMIT_MSG@31
PS33, Line 31: this in the near future.
May file a new JIRA and mention the JIRA number here.


http://gerrit.cloudera.org:8080/#/c/18023/33/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java:

http://gerrit.cloudera.org:8080/#/c/18023/33/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@153
PS33, Line 153: // 'AND' compound predicates will replaced by children 
in Expr#getConjuncts, so
nit be


http://gerrit.cloudera.org:8080/#/c/18023/33/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@154
PS33, Line 154: // selectivity hint will missing, we add a warning here.
nit be


http://gerrit.cloudera.org:8080/#/c/18023/33/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@155
PS33, Line 155: analyzer.addWarning("Selectivity hint is invalid for 
'AND' compound predicates.");
> Currently, this warning will not appear for BETWEEN predicate, since I remo
There is one case in CaseExpr.java where the translation to AND predicate takes 
place.

I think we can change the warning message to something like below to cover all 
such cases.

"Selectivity hints are ignored for 'AND' compound predicates, either in the SQL 
query or internally generated."



--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 33
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 18 Apr 2023 22:54:33 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-11 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate
selectivities
..

Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18023/29/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/18023/29/fe/src/main/java/org/apache/impala/analysis/Expr.java@1067
PS29, Line 1067: // Selectivity hint cannot be set. If Predicate been set
selectivity hint, we return
: // itself directly, such as CompoundPredicate.
Otherwise, hint value will missing.
> Hi Qifan. What do you mean 'Note a Predicate contains the selectivityHint_
Option 1) probably will exclude lots of common predicates from utilizing this
useful feature. But without some study, we may not get the right answer. So my
vote for this patch is 1).

Option 2) sounds like a good starting point to address 1). But we need to find
out how the selectivity at AND predicate is computed from child(0) and
child(1). From that we may be able to back fit the selectivity hint from the
AND predicate down. This will be for the general cases where SELECT_HINT(AND)
!= -1.

For the special case: SELECT_HINT(AND) = -1, then just store child(0) and
child(1) in the list without changing their selectivity hint (even being -1) at
all.

--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 29
Gerrit-Owner: wangsheng
Gerrit-Reviewer: Amogh Margoor
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kurt Deschler
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Quanlong Huang
Gerrit-Reviewer: Xiang Yang
Gerrit-Reviewer: Zoltan Borok-Nagy
Gerrit-Reviewer: wangsheng
Gerrit-Comment-Date: Tue, 11 Apr 2023 20:03:31 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-11 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate
selectivities
..

Patch Set 29:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18023/29/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/18023/29/fe/src/main/java/org/apache/impala/analysis/Expr.java@1067
PS29, Line 1067: // Selectivity hint cannot be set. If Predicate been set
selectivity hint, we return
: // itself directly, such as CompoundPredicate.
Otherwise, hint value will missing.
This comment is not accurate and should be removed. The original comment from
Base version should be restored here.

It looks to me the original form of getConjuncts() should work fine even when
selectivity hints exist in the tree anchored at . This is because both
child(0) and child(1) should be a when is a
CompoundPredicate. Note a Predicate contains the selectivityHint_ as the new
data member.

If the above is true, then we do not need getLocalConjuncts() at all.

IMHO, when selectivity hints are supplied, we should let them flow to the rest
of the compilation phases as a single piece of data to represent the moment of
truth. Allowing two or more representations can introduce unnecessary
complexity in the down stream.

--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-04 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 28: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 28
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 04 Apr 2023 14:10:44 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-04-02 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 27: Code-Review+1

(8 comments)

Looks great.  Just have some minor comments on the commit message.

http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@9
PS27, Line 9: Currently, Impala only uses simple estimation to compute 
selectivity
nit. stop the sentence here (.)


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@10
PS27, Line 10: for some predicates, and this may lead
nit.

For some predicates, this may lead to ...


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@11
PS27, Line 11: Hence, we add new hints to reduce such errors.
nit. The next sentence at line 13 mentions the specific work done in this 
patch. I think we can remove this sentence.


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@13
PS27, Line 13: s
nit. hint.


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@14
PS27, Line 14: original selectivity computing
nit. specify a selectivity value for a predicate.


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@17
PS27, Line 17: C-style /* comment */
nit. C-style comment /*  */


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@17
PS27, Line 17: currently
 : supports
nit. can be in the form of


http://gerrit.cloudera.org:8080/#/c/18023/27//COMMIT_MSG@25
PS27, Line 25: Predicate Example
nit. predicate example



--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 27
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Sun, 02 Apr 2023 13:29:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11536: fix invalid predicates propagate for outer join simplification

2023-04-02 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19116 )

Change subject: IMPALA-11536: fix invalid predicates propagate for outer join 
simplification
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Gerrit-Change-Number: 19116
Gerrit-PatchSet: 9
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Xianqing He 
Gerrit-Comment-Date: Sun, 02 Apr 2023 13:18:54 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 07 Mar 2023 02:38:03 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57: Code-Review+2

(2 comments)

+2 to include WenZhe's +1.

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30:   long cardinality, float exprsCost, float 
materializationCost) {
> Agree that row width might factor in the PC for some operator. Is fact, I a
For VARCHAR, we can use some kind of average width stats, if available.  For 
fixed width columns, we just use the width. In both cases, the unit should be 
in bytes, at least in first draft.

The idea of including a width in costing is to make the outcome as precise and 
less error-prone as possible.

I am okay with making the change in next iteration. Since being very important, 
maybe creating a new JIRA and referring to it in the commit message.


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270:
> Let say,
I see.  Thanks for the examples.  I agree the use of > is fine.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 14:20:40 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-03-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 14
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 06 Mar 2023 13:37:09 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-03 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 54:

(8 comments)

Looks great!

Thanks a lot for the changes, some of them are significant.

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@295
PS54, Line 295: .
Need to mention the default value. Maybe set it to MT_DOP?

Should we also mention PROCESSING_COST_MAX_THREADS which can be default to max 
# of cores?


http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@328
PS54, Line 328: Q3
  :   CoreCount={total=12 trace=F00:12}
  :
  : Q12
  :   CoreCount={total=12 trace=F00:12}
  :
  : Q15
  :   CoreCount={total=15 trace=N07:3+F00:12}
indentation


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30:   long cardinality, float exprsCost, float 
materializationCost) {
I think we should factor in the row width in computing PC, as row width can 
vary a lot. Without considering the width, the computed PC may not be right.

PC = cardinality * width * (expr cost + materialization cost).


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java@31
PS54, Line 31: requirement
CPU cores, computed from the CPU cost,


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270:
> I think '>" is the correct sign here?
Not sure I follow.  Maybe an example?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@78
PS52, Line 78:
> When I first work on this, I intentionally made ProcessingCost to not accep
Yeah a revisit will help. It sounds like we need to deal with the unknown cost 
(-1 in cardinality) as well.  Maybe we do not adjust in such situations?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@102
PS52, Line 102:
> I decide to remove traverseBlockingAwareCost() method.
I see.

Just wonder if the algorithm can be modified to not check the state of the 
children. In other words, each child can supply an expected core count which is 
available when the parent node is being processed.


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@118
PS52, Line 118:
> In traverseBlockingAwareCores(), I changed the loop to iterate all children
Sounds like a conservative strategy. Done.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 54
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 03 Mar 2023 14:54:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-02-28 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 13: Code-Review+1

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 28 Feb 2023 14:32:35 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-27 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 52:

(25 comments)

The implementation of the algorithm looks good!

If the specific checks on JOIN and UNION can be generalized at the PlanNode
level, it will be great.

I also wonder if the caching of costs/cores used to compute the total is
necessary.

The producer consumer ratio based adjustment is fine and could be improved
further at the guidance of a plan node in the future.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171:
: In bottom-up direction, there exist four segments in F03:
: Blocking segment 1: (11:EXCHANGE, 12:AGGREGATE)
: Blocking segment 2: 06
> Done. I rather keep calling segment 4 as non-blocking segment since only Jo
Done

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205:
> No, SegmentCost is modelled as tree.
Done

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@24
PS52, Line 24: MultipleProcessingCost
nit. Maybe ScaledProcessingCost? MultipleProcessingCost sounds like a list of
processing costs to me.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@32
PS52, Line 32: > 0
>=0.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@61
PS52, Line 61: MultCost
ScaledCost()?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@151
PS52, Line 151: ProcessingCost, List
nit. May need to add a comment for the first and the second. I also wonder if
we need to cache the list, once we have the total cost (the sum over the
segment subtree).

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@155
PS52, Line 155: CoreCount, List
same as above.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@254
PS52, Line 254: consProdRatio
consumerProducerRatio

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270: >
>=

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@48
PS52, Line 48: SegmentCost
Maybe CostingSegment?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@54
PS52, Line 54: // The ProcessingCost of this fragment segment.
nit. which is the sum of the processing cost of all nodes in nodes_.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@78
PS52, Line 78: 1
I wonder why 0 is being excluded. A node consuming or producing 0 rows is
perfectly fine.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@79
PS52, Line 79: 1
same as above.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@83
PS52, Line 83: appendSinkCost
appendSink()

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@102
PS52, Line 102: subtreeCostBuilder
I wonder if we need to remember the costs used to compute the max cost for the
tree rooted at the current segment. Once the post-order traversal is done, the
sum is known. These sums can be used to compute the # of cores.
Maybe I missed something?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@103
PS52, Line 103: segmentCost
nit. maxCost

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@105
PS52, Line 105: .
gather the costs of the children first and find the max cost among them.

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-23 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 51:

(26 comments)

Looks very good!

I will look at the code corresponding to section II, III and IV this weekend.

Can you please also confirm that segments are still modeled as a list within a 
fragment?  How hard is it to model as a tree?  Personally I think it is very 
important that all operators can participate in EDoP adjustment to make this 
feature sound.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151
PS51, Line 151: fragment plan
nit plan fragment, which is blocking since it has 3 blocking PlanNode:
12:AGGREGATE, 06:SORT, and 08:TOP-N.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171: 1. (11:EXCHANGE, 12:AGGREGATE)
  : 2. 06:SORT
  : 3. (07:ANALYTIC, 08:TOP-N)
  : 4. DataStreamSink of F03
indentation and with some additional info as follows.

Blocking segment 1:  (11:EXCHANGE, 12:AGGREGATE)
Blocking segment 2: 06:SORT
Blocking segment 3:  (07:ANALYTIC, 08:TOP-N)
Non-blocking segment 4: DataStreamSink of F03

I also wonder if segment 4 should be a blocking one since by the definition 
above any DataSink is blocking.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177
PS51, Line 177: PC(segment 1) = 426337+34548320
  : PC(segment 2) = 2159270
  : PC(segment 3) = 23751970+900
  : PC(segment 4) = 22
indentation


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182
PS51, Line 182: These per-segment costs stored in SegmentCost tree rooted at
nit a


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183
PS51, Line 183: . In this example, post-order traversal of
  : rootSegment_ will show their associated cost as:
  : [34974657, 2159270, 23752870, 22]
nit. , and are [34974657, 2159270, 23752870, 22] respectively after the 
post-order traversal.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@186
PS51, Line 186: F03 is also a blocking fragment since it has 3 blocking 
PlanNode:
  : 12:AGGREGATE, 06:SORT, and 08:TOP-N.
remove, as the info is described above.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189
PS51, Line 189: A rootSegment_ is also called an Output ProcessingCost
I think "Output ProcessingCost" should be really called "Total Processing 
cost", since it takes some cost for a fragment to output rows (not cost).


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196
PS51, Line 196: effective parallelism
nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of the 
text.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199
PS51, Line 199: algorithms will
  : try to adjust
the costing algorithm attempts to adjust the number of


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201
PS51, Line 201: Output
see the previous comment.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205: .
Assume that segments are modeled as a list in a plan fragment (true?) in this 
patch, we should append the following here:

, since segments are modeled as a list in a plan fragment .


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209
PS51, Line 209: the effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261
PS51, Line 261: several
nit suggest to remove since a query plan with a sink node, which is blocking 
node,  at the root maps to one blocking subtree.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263
PS51, Line 263: leaves. All other fragments in the subtree are
  : non-blocking
the intermediate or leaf nodes.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268
PS51, Line 268: CoreRequirement
By reading this para, it seems CoreCount is a better name.  Usually a 
requirement in SQL compiler means something not solid, such as ANY, NOT SINGLE, 
etc.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269
PS51, Line 269: certain
nit remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286
PS51, Line 286: effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group to determine if it fits to run in that executor 
group set
set


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group
nit. remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292
PS51, Line 292: this CPU costing algorithm
the entire computation of EDoP.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300
PS51, Line 300: reducing
computing


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301
PS51, Line 301: Currently, there is no option to

[Impala-ASF-CR] IMPALA-7942 (part 2): Add query hints for predicate selectivities

2023-02-20 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18023 )

Change subject: IMPALA-7942 (part 2): Add query hints for predicate 
selectivities
..


Patch Set 12:

Can you please provide feedback on all comments?

For example, for the  following comment, a feedback can be DONE, a reply etc.

Commit Message
Line 14:
TABLE_NUM_ROWS?

The feedbacks are useful to judge the rework. Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/18023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2776b9bbd878b8a21d9c866b400140a454f59e1b
Gerrit-Change-Number: 18023
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 20 Feb 2023 13:22:52 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: The costing algorithm splits a query fragment into several 
segments
  : divided by blocking PlanNode/DataSink boundary. Each fragment 
segment is
  : a subtree of PlanNodes/DataSink in the fragment with a DataSink 
or
> I will rearrange this paragraph and clarify a bit more about "segment" ment
A list implies the linear structure among blocking segments. Not sure it can 
model the graph relationship among them well.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 100]
> To me, it should be:
Yes, AGGREGATE of 12 is blocking.

A sink is like the bottom part of an exchange. Rows flow into it. Therefore it 
should not be a block segment by itself. Its cost can be included into that of 
the tree sending data to it.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@258
PS47, Line 258:  |
> F06 and F05 has JoinBuildSink (a special kind of DataSink that is blocking)
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:43:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 47:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@138
PS46, Line 138: a list of ProcessingCost
> The MergeJoin scenario you mention sounds possible. I believe such case can
Yeah I agree that the feature in IV that can look at the connected fragments is 
a plus. However, the same idea can be implemented here too.

Thanks a lot for check the MJ case, which is used to illustrate the importance 
of operator-driven based decision on DoP for blocking operator. Hopefully this 
can be implemented in step II, III and IV based implementation.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@258
PS46, Line 258: 03, F04].
  :
  : A CoreRequirement
> The sequential nature of build-probe is resolved through the CoreRequiremen
The problem is with the sum((F05+F02), (F06+F01)) used for HJ04, as it assumes 
the HJ will build and probe as the same time.

Looks like step IV should look into query node (same idea as collapsing into PC 
computation).


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: A query fragment has a processingCosts_ field, a list of 
ProcessingCost
  : associated with the fragment. Each element of processingCosts_ 
may
  : solely come from a single PlanNode or sum of adjacent PlanNode 
costs.
This para probably should be reworded as follows and moved after the para 
ending 149, as it is better to introduce the concept of blocking segments first 
and then define some properties for it.

The two new data members are introduced for the query fragment as follows.
1. totalFragmentProcessingCost_ - the total fragment processing cost;
2. blockingSegmentProcessingCosts_ - a map of processing costs for each block 
segment, keyed on the root node of the blocking segment;

Thus for a non-blocking segment, blockingSegmentProcessingCosts_ is empty. For 
a blocking segment, it has an entry in the map, summarizing the total 
processing cost for the blocking segment.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@142
PS47, Line 142: the
nit a query


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@149
PS47, Line 149: further in step IV.
A fragment without any blocking nodes is called a non-blocking fragment.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 34550429, 2159270, 23752870, 1
There exist two blocking segments in F03:

1. 08(07)
2. 06(12(11))

Therefore we have
PC(blocking segment 1) = 2159270 + 34548320 + 2109
PC(blocking segment 2) = 900 + 23751970 + 1


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@258
PS47, Line 258: (F05, F02), (F06, F01)
These two do not follow the definition of a blocking subtree in that both root 
fragment 05 and 06 are not blocking.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@269
PS47, Line 269: successor
not clear. You mean previous?



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 47
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 16:33:35 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 46:

(7 comments)

Thanks a lot Riza for the update to the commit message, in particular the 
examples which helps me understanding the step 2, 3 and 4 better.

I provided an example to emphasize the importance of integrating the logic in 
step 2, 3 and 4 into the computation of PC(node). Please let me know if this 
will fly.

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@138
PS46, Line 138: a list of ProcessingCost
I think a single processingCost (instead of a list of processingCost) may be 
sufficient for a fragment, to compute effective DoP.  As we walk bottom-up, we 
compute that for each blocking segment from its children and finally that for 
the entire fragment.

Take exchange(sort(join(sort(scan T1), sort(scan T2))) as example, we can 
compute
1. PC(sort(scan T1)) = (x, 10) - cost of x, 10 cores
2. PC(sort(scan T2)) = (y, 16) - cost of y, 16 cores
3. PC(join) = (x, 10) + (y, 16) = (x+y, 26), if we want the two sort nodes to 
work in parallel for a merge join. For hash join, we should take the max on # 
cores to arrive at (z, 16), since the build side is done first, followed by the 
probing side.
4. PC(sort(join)) =  PC(join) since this top sort can not proceed unless the 
join below produces all the results. That is, there is no need to allocate 
extra cores for the this top sort.
5. The # of cores is available: 26 for MJ, 16 for HJ for the entire fragment.

Note that in this example, the computation of # of cores takes care of blocking 
nodes and more importantly the semantics of query nodes. As a bonus, we may be 
able to get rid of step II, III and IV all together.

To make it work efficiently, PC(node) can be extended into (C, N, blocking), 
where the boolean flag  =
 1. true when the subtree rooted at node contains a blocking node
 2. false otherwise.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@149
PS46, Line 149: F03:PLAN FRAGMENT [HASH(i_class)] hosts=3 instances=3
  : fragment-costs=[34550429, 2159270, 23752870, 1]
  : 08:TOP-N [LIMIT=100]
  : |  cost=900
  : |
  : 07:ANALYTIC
  : |  cost=23751970
  : |
  : 06:SORT
  : |  cost=2159270
  : |
  : 12:AGGREGATE [FINALIZE]
  : |  cost=34548320
  : |
  : 11:EXCHANGE [HASH(i_class)]
  :cost=2109
nit. Indentation can help readability.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@185
PS46, Line 185: the query plan tree visiting
  : PlanFragment from the leaf and going up to the root
nit. ,bottom-up, all PlanFragments in the query plan tree.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@190
PS46, Line 190: its adjacent blocking segments (elements in processingCosts_
  : list)
See my comment on the removal of the list of processingCost above.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@206
PS46, Line 206: F04:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
  : PLAN-ROOT SINK
  : |
  : 13:MERGING-EXCHANGE [UNPARTITIONED]
  : |
  : F03:PLAN FRAGMENT [HASH(i_class)] hosts=3 instances=3 (adjusted 
from 12)
  : 08:TOP-N [LIMIT=100]
  : |
  : 07:ANALYTIC
  : |
  : 06:SORT
  : |
  : 12:AGGREGATE [FINALIZE]
  : |
  : 11:EXCHANGE [HASH(i_class)]
  : |
  : F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
  : 05:AGGREGATE [STREAMING]
  : |
  : 04:HASH JOIN [INNER JOIN, BROADCAST]
  : |
  : |--F05:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
  : |  JOIN BUILD
  : |  |
  : |  10:EXCHANGE [BROADCAST]
  : |  |
  : |  F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  : |  02:SCAN HDFS [tpcds10_parquet.date_dim, RANDOM]
  : |
  : 03:HASH JOIN [INNER JOIN, BROADCAST]
  : |
  : |--F06:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
  : |  JOIN BUILD
  : |  |
  : |  09:EXCHANGE [BROADCAST]
  : |  |
  : |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  : |  01:SCAN HDFS [tpcds10_parquet.item, RANDOM]
  : |
  : 00:SCAN HDFS [tpcds10_parquet.web_sales, RANDOM]
nit indentation.


http://gerrit.cloudera.org:8080/#/c/19033/46//COMMIT_MSG@247
PS46, Line 247: A blocking fragment is a fragment that has a blocking plan 
node. The
  : costing

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-10 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 45:

(13 comments)

Many thanks for the rework.

Added comments for the commit message in this review session, mainly to help me
understand the adjustment algorithm better. Plan to do another round review of
the adjustment algorithm itself once I get the feedback.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@144
PS44, Line 144: same segment will have their Process
Each blocking segment is a subtree of the PlanNodes in the fragment with a
blocking root and blocking/non-blocking leaves. All other nodes in the subtree
are non-blocking.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@146
PS44, Line 146: called Input ProcessingCost, while the last cost in
nit the

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@154
PS44, Line 154: t.co
nit missing 's'

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@158
PS44, Line 158:
May need to further explain, say with the concept of a blocking fragment. It
is not clear to me which fragment is adjusted.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@161
PS44, Line 161: just the
nit decides

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@177
PS44, Line 177: meaning that
nit "At the blocking fragment" is better here as it 'blocking fragment' is well
defined.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@177
PS44, Line 177: s can run in
same. use of blocking segment is better.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@178
PS44, Line 178: core per fragme
It is better to define CoreRequirement first before use. It is not clear
whether it is for a fragment or for a node.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@188
PS44, Line 188: pared against the total CPU
Old name.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@195
PS44, Line 195:Co
query_cpu_requirement_scale or its new form should be listed here as well.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@200
PS44, Line 200: m i
Do we need a min_threads control? If not, please add a comment here to explain
the adjustment is always increasing.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@212
PS44, Line 212:
:
for completeness, better to list them here as like the three FE ones.

http://gerrit.cloudera.org:8080/#/c/19033/44/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
File
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test:

http://gerrit.cloudera.org:8080/#/c/19033/44/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@75
PS44, Line 75: ss_sold_date_sk =
> Done. Add max-parallelism and fragment-costs as well.
Nice. Thanks!

--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 45
Gerrit-Owner: Qifan Chen
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kurt Deschler
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Riza Suminto
Gerrit-Reviewer: Wenzhe Zhou
Gerrit-Comment-Date: Fri, 10 Feb 2023 16:24:56 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-08 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 44:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@16
PS44, Line 16: blocking-operator nature
May need to explain why blocking operators are considered in the context of DoP
adjustment.

Conceptually, blocking operators can be divided into multiple tiers with those
in the 1st are close to the leaf nodes and can run right away. Those in the 2nd
tier can run when all their dependent children (including the blocking
operators in 1st tier) can provide data. I did not see a linkage to increase
DoP for blocking operators.

http://gerrit.cloudera.org:8080/#/c/19033/44//COMMIT_MSG@19
PS44, Line 19: explained
nit found

http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/util/backend-gflag-util.cc@201
PS37, Line 201: 0.5
> My intention is to associate the scaling flag to CPU requirement of the que
The scaling factor as defined is less intuitive, since one has to inverse it to
understand its semantics.

I think you can define the true scaling factor to be = 1 / .

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@147
PS37, Line 147: have an adjusted number of instance based on
: // Processin
nit. A positive value implies the instance count has been adjusted.

It is also nice to provide an example here.

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@864
PS37, Line 864: elism, numNode
Better renamed as getMaxParallelismByTotalWorkSize().

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@883
PS37, Line 883:
: protected boolean hasAdjustedInstanceCount() { return
adjustedInstanceCount_ > 0; }
:
: protected void setFixedInstanceCount(int count) {
: isFixedParallelism_ = true;
: setAdjustedInstanceCount(count);
: }
Repeated use from line 869. Can be refactored.

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@957
PS37, Line 957: processingCosts_.get(index).getNumInstanceExpected());
Should add a comment.

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@972
PS37, Line 972:
: // Compute exchanging child parallelism first.
I wonder if the computation can be improved here e.g. by the size of the work.
Making it to the max # of nodes can overuse the system resource.

In general, I wonder if this logic tries to fix some bugs in DoP computation.
Adjusting DoP specifically for plans with blocking operators seems odd. See my
comment to the commit message.

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1098
PS37, Line 1098: if (hasBlockingNode()) {
add a comment should be helpful.

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1113
PS37, Line 1113:
does not sound right.

http://gerrit.cloudera.org:8080/#/c/19033/44/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@75
PS44, Line 75: cardinality=3.04K
Is it possible to show the new processing cost here too? It will be wonderful.

--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 44
Gerrit-Owner: Qifan Chen
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kurt Deschler
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Riza Suminto
Gerrit-Reviewer: Wenzhe Zhou
Gerrit-Comment-Date: Wed, 08 Feb 2023 15:54:25 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11809: Support non unique primary key for Kudu

2023-02-02 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 16: Code-Review+1

(2 comments)

Thanks for the rework!

http://gerrit.cloudera.org:8080/#/c/19383/15/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/19383/15/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@107
PS15, Line 107:
  :   // Set to true if primary key is unique.
  :   private boolean isPrimaryKeyUnique_ = true;
  :
  :   // Set to true if the table has auto-incrementing column.
  :   private boolean hasAutoIncrementingColumn_ = false;
> KuduTable extends from Table, LocalKuduTable extends from LocalTable. They 
Sounds like a good plan for some future work. Done


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
> Already added similar test case with range partitions in end-to-end kudu_cr
Done



--
To view, visit http://gerrit.cloudera.org:8080/19383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd
Gerrit-Change-Number: 19383
Gerrit-PatchSet: 16
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 02 Feb 2023 23:09:27 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11809: Support non unique primary key for Kudu

2023-02-02 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 15:

(7 comments)

Looks great!

http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG@23
PS15, Line 23: in a tablet-server.
nit within a continuous region of the table served by a tablet-server.


http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG@25
PS15, Line 25: u
nit also


http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG@47
PS15, Line 47:   AS SELECT id, string_col FROM functional.alltypes WHERE id = 
10;
It will be great to include a DDL for a range partitioned kudu table here, 
assume the auto column is supported for such a table.

Including a DDL for a non-partition table here is also nice.


http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG@57
PS15, Line 57: t
Kudu


http://gerrit.cloudera.org:8080/#/c/19383/15//COMMIT_MSG@59
PS15, Line 59: Kudu
nit. for these tables


http://gerrit.cloudera.org:8080/#/c/19383/15/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/19383/15/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@107
PS15, Line 107:
  :   // Set to true if primary key is unique.
  :   private boolean isPrimaryKeyUnique_ = true;
  :
  :   // Set to true if the table has auto-incrementing column.
  :   private boolean hasAutoIncrementingColumn_ = false;
These two new members could be put into a new class and reused between 
KuduTable.java and LocalKuduTable.java.


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
> Added test case in kudu-scan-node.test with unpartitioned Kudu table. All r
To observe auto column works with a range partitioned kudu table such as

CREATE TABLE t1 (id STRING PRIMARY KEY, s STRING)
  PARTITION BY RANGE (PARTITION 'a' <= VALUES < '{', PARTITION 'A' <= VALUES < 
'[', PARTITION VALUES = '0')
  STORED AS KUDU;

if it is feasible.



--
To view, visit http://gerrit.cloudera.org:8080/19383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd
Gerrit-Change-Number: 19383
Gerrit-PatchSet: 15
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 02 Feb 2023 16:28:42 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11809: Support non unique primary key for Kudu

2023-02-01 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
> In this case, there is only one partition for a table so the table could no
Yeah, agree that with the tablet concept.

However,  for a range partitioned ( such as by months in a year) kudu table 
test, we can still benefit from the unique-ness within each partition  (aka a 
kudu tablet). For example, to answer the question of chronicle order of row 
insertions.

Would it be possible to add such a test case?


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@204
PS10, Line 204: order by id
> It's unique per Kudu tablet-server. Clarified in the commit message. They d
Done



--
To view, visit http://gerrit.cloudera.org:8080/19383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd
Gerrit-Change-Number: 19383
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 01 Feb 2023 15:59:09 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-01-31 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 37:

(14 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/scheduling/scheduler.cc@266
PS37, Line 266: effective_instance_count
nit. is positive


http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/scheduling/scheduler.cc@272
PS37, Line 272:   if (effective_instance_count > 0) {
Is it possible to exclude the checking for scan fragment and certain table sink 
fragment here?


http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/37/be/src/util/backend-gflag-util.cc@201
PS37, Line 201: 0.5
Just wonder why not directly use a value of 2 here.


http://gerrit.cloudera.org:8080/#/c/19033/37/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/37/common/thrift/Planner.thrift@87
PS37, Line 87: 14:
Miss the comment here


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@364
PS37, Line 364: 'partitionExprs_' is excluded
This does not match with the code at line 368: 
ExprUtil.computeExprCost(partitionByEq_)


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@365
PS37, Line 365: sorted within each
This does not match with the code at line 368 
ExprUtil.computeExprCost(orderByEq_)


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java@33
PS37, Line 33: cost_;
May use the name childProcessingCost_  or add a comment to indicate this is the 
cost from the child.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java@47
PS37, Line 47: multiple_.get() > 0, "BroadcastProcessingCost: multiple must be 
greater than 0!");
Code duplication with line at 40.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java
File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@29
PS37, Line 29:
Comment is missing.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@31
PS37, Line 31:  private final List ids_;
 :   private final List counts_;
 :   private int total_ = -1;
A comment for each data member.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/DataSink.java
File fe/src/main/java/org/apache/impala/planner/DataSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/DataSink.java@49
PS37, Line 49: Gets set correctly
nit Set in computeProcessingCost() for a meaningful value.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/DataSink.java@131
PS37, Line 131: into
nit. data fields in


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/DataSink.java@141
PS37, Line 141: setNumInstanceExpected
This is missing in the comment at line 131 above.


http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/37/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@26
PS37, Line 26: multiple_
multiplier_ is better.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 31 Jan 2023 17:29:32 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-01-30 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 37:

Will review when I have a chance this week.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 30 Jan 2023 23:59:38 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11809: Support non unique primary key for Kudu

2023-01-30 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 11: Code-Review+1

(3 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19383/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19383/11//COMMIT_MSG@45
PS11, Line 45: When creating a Kudu table, specifying PRIMARY KEY is optional.
 : If there is no primary key attribute specified, the partition key
 : columes will be promoted as non unique primary key if those 
columns
 : are the beginning columns of the table.
 : New column "key_unique" is added to the output of 'describe' 
table
 : command for Kudu table.
Suggest to move this para between line 25 and 26.


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
> Added two more test cases as suggested.
I see.  It may be a good feature to have!


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@204
PS10, Line 204: order by id
> Yes, we can order by auto_incrementing_id. But auto_incrementing_id is not
When is it unique?  Maybe we can clarify in the commit message.



--
To view, visit http://gerrit.cloudera.org:8080/19383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d7882bf3d01a3492cc9827c072d1f3200d9eebd
Gerrit-Change-Number: 19383
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Abhishek Chennaka 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Marton Greber 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 30 Jan 2023 23:58:46 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11809: Support non unique primary key for Kudu

2023-01-17 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: WIP IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 10:

(13 comments)

Added some comments on testing and the commit message.

http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19383/10//COMMIT_MSG@30
PS10, Line 30: Kudu assign values for auto-incrementing column automatically
 : when inserting rows so insertion statements don't need to specify
 : values for auto-incrementing column.
Kudu recently enables the auto-incrementing column feature.  The feature works 
by appending a system generated column to the primary key columns to guarantee 
the uniqueness on primary key, when the primary key columns can be non-unique.

This auto column is of type big int and the assignment to it during insertion 
is automatic.


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2365
PS10, Line 2365: partition by hash
may add some more test cases with the following.

1. No partition by clause
2. Non unique primary key includes all selected columns

It seems even when primary key list is empty, the auto incrementing column can 
still  function as the primary key.  Do we allow such a case?


http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19383/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@88
PS10, Line 88: AnalyzesOk("create table tab (x int non unique primary key) 
partition by hash(x) " +
 : "partitions 8 stored as kudu", isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x int, non unique primary 
key(x)) " +
 : "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x int, y int, non unique 
primary key (x, y)) " +
 : "partition by hash(x, y) partitions 8 stored as kudu", 
isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x int, y int, non unique 
primary key (x)) " +
 : "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x int, y int, non unique 
primary key(x, y)) " +
 : "partition by hash(y) partitions 8 stored as kudu", 
isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x timestamp, y timestamp, non 
unique primary key(x)) "+
 : "partition by hash(x) partitions 8 stored as kudu", 
isExternalPurgeTbl);
 : // Promote all partition columns as non unique primary key 
columns if primary keys
 : // are not declared, but partition columns must be the first 
columns in the table.
 : AnalyzesOk("create table tab (x int, y int) partition by 
hash(x) partitions 8 " +
 : "stored as kudu", isExternalPurgeTbl);
 : AnalyzesOk("create table tab (x int, y int) partition by 
hash(x, y) partitions 8 " +
 : "stored as kudu", isExternalPurgeTbl);
 : AnalysisError("create table tab (x int, y int) partition by 
hash(y) partitions 8 " +
 : "stored as kudu", "Specify primary key or non unique 
primary key for the Kudu " +
 : "table, or create partitions with the beginning columns 
of the table.",
 : isExternalPurgeTbl);
In some of the new tests, may try partition by range and no partition to cover 
more cases.  See https://kudu.apache.org/docs/kudu_impala_integration.html.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@204
PS10, Line 204: order by id
Wonder if we can order DESC by auto_incrementing_id which can be useful to get 
the latest rows.


http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

http://gerrit.cloudera.org:8080/#/c/19383/10/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@709
PS10, Line 709: auto-incrementing column auto_incrementing_id
nit. The column name appeared twice.

[Impala-ASF-CR] WIP IMPALA-11809: Support non unique primary key for Kudu

2023-01-15 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19383 )

Change subject: WIP IMPALA-11809: Support non unique primary key for Kudu
..


Patch Set 9:

(16 comments)

Looks good!

Will review the test part of the patch next.

http://gerrit.cloudera.org:8080/#/c/19383/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19383/9//COMMIT_MSG@9
PS9, Line 9: Kudu engine adds support for non unique primary key. It adds
nit recently


http://gerrit.cloudera.org:8080/#/c/19383/9/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19383/9/common/thrift/CatalogObjects.thrift@580
PS9, Line 580: add
nit adds


http://gerrit.cloudera.org:8080/#/c/19383/9/common/thrift/CatalogObjects.thrift@581
PS9, Line 581: .
nit. ", in this case, this field is set to false"


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java@99
PS9, Line 99: c.isPrimaryKeyUnique() ? "" : "non unique ", c.toString()
This statement can be integrated into ColumnDef.toString().


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@227
PS9, Line 227: createStmt_.isPrimaryKeyUnique(), 
createStmt_.getPrimaryKeyColumnDefs(),
nit. It may be more logic to switch the two arguments, as the new argument 
further refines the primary key columns.


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/TableDef.java@87
PS9, Line 87: An auto-incrementing column will be added
:   // automatically by Kudu engine as key column if primary key is 
not unique.
nit. If not, an auto-incrementing column will be added
automatically by Kudu engine. This extra key column helps produce a unique 
composite primary key (primary keys + auto-incrementing construct).


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/TableDef.java@537
PS9, Line 537: only the columns, on which the
 :   // partitions are being created,
nit. the partition columns have to be first in the primary key columns.


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/analysis/TableDef.java@566
PS9, Line 566:   throw new AnalysisException(String.format("Multiple 
%sprimary keys specified. " +
 :   "Composite %sprimary keys can be specified using the " 
+
 :   "%sPRIMARY KEY (col1, col2, ...) syntax at the end of 
the column definition.",
 :   isPrimaryKeyUnique() ? "" : "non unique ",
 :   isPrimaryKeyUnique() ? "" : "non unique ",
 :   isPrimaryKeyUnique() ? "" : "NON UNIQUE "));
nit. This looks similar to the lines starting at 520.


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/Db.java@242
PS9, Line 242: boolean isPrimaryKeyUnique, List primaryKeyColumnDefs,
As noted in another comment, these two arguments probably should be swapped.


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
File fe/src/main/java/org/apache/impala/catalog/KuduColumn.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/KuduColumn.java@134
PS9, Line 134: position
nit. 'position'.


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/KuduColumn.java@135
PS9, Line 135: if
nit. when


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/catalog/KuduColumn.java@144
PS9, Line 144: false
One may call this function even when column.isIs_primary_key_unique() is true.  
Suggest to replace false with column.isIs_primary_key_unique().


http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19383/9/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@133
PS9, Line 133:   if (isKey && !isKeyUnique) {
 :

[Impala-ASF-CR] IMPALA-11829 - Fix bug in cardinality estimates related to TABLE NUM ROWS hint

2023-01-15 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19421 )

Change subject: IMPALA-11829 - Fix bug in cardinality estimates related to 
TABLE_NUM_ROWS hint
..


Patch Set 2: Code-Review+1

Looks good!


-- 
To view, visit http://gerrit.cloudera.org:8080/19421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia27745fd93abd5dec99bf82f16899bd15a2b88ae
Gerrit-Change-Number: 19421
Gerrit-PatchSet: 2
Gerrit-Owner: David Rorke 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Sun, 15 Jan 2023 18:03:21 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10986 (Addendum): Add and refactor some E2E tests

2023-01-15 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19394 )

Change subject: IMPALA-10986 (Addendum): Add and refactor some E2E tests
..


Patch Set 9: Code-Review+2

Looks good!


--
To view, visit http://gerrit.cloudera.org:8080/19394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb4f69934401a745da66a983528a7a3679279c28
Gerrit-Change-Number: 19394
Gerrit-PatchSet: 9
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 15 Jan 2023 18:00:33 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11786: Preserve memory for codegen cache

2023-01-03 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19377 )

Change subject: IMPALA-11786: Preserve memory for codegen cache
..


Patch Set 3:

It may be a good idea to test a good default value  (say 10%) with a 
suitable workload such as tpcds. Basically, a minimal value that can help 
achieve the most benefit of the cache.

In the long run, it seems a good idea to implement a caching strategy (i.e. 
removal of old items) so that the cache works well even configured with less 
than optimal threshold value .


--
To view, visit http://gerrit.cloudera.org:8080/19377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebdc04ba1b91578d74684209a11c815225b8505a
Gerrit-Change-Number: 19377
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Wed, 04 Jan 2023 00:08:05 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10986 (Addendum): Add and refactor some E2E tests

2023-01-03 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19394 )

Change subject: IMPALA-10986 (Addendum): Add and refactor some E2E tests
..


Patch Set 6: Code-Review+2

(1 comment)

Thanks for the improvement!

http://gerrit.cloudera.org:8080/#/c/19394/6/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/19394/6/tests/authorization/test_ranger.py@1248
PS6, Line 1248: a user
nit 'test_user'



--
To view, visit http://gerrit.cloudera.org:8080/19394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb4f69934401a745da66a983528a7a3679279c28
Gerrit-Change-Number: 19394
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 03 Jan 2023 19:38:30 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11812: Deduplicate column schema in hmsPartitions

2022-12-29 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19391 )

Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions
..


Patch Set 6:

(9 comments)

Looks like a very good optimization. Looks good to me.

http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@15
PS6, Line 15: interned
nit. internalized


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@18
PS6, Line 18: In fact
nit. In reality.


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@21
PS6, Line 21: adds codes to replace
nit. replaces


http://gerrit.cloudera.org:8080/#/c/19391/6//COMMIT_MSG@22
PS6, Line 22: hmsPartitions with the table level column list. Also add some 
progress
nit. to remove the duplications.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1862
PS6, Line 1862: fetchPartitionsByName
nit. This method probably should be renamed to fetchPartitionsByTable() due to 
the argument change.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5724
PS6, Line 5724:   LOG.info("Updated cache directive id for {}/{} 
partitions", numDone,
nit. may append "for table " for clarity.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@208
PS6, Line 208: LOG.info("Fetched {}/{} partitions for table {}.{}", numDone, 
partNames.size(),
 :   msTbl.getDbName(), msTbl.getTableName());
This probably will log  number of progressive messages.  
In normal case, I think we can just log one message when  == 
numDone outside the loop. The progressive style logging is needed only for 
abnormal situation.

This applies to other places where progressive logging is added in this patch.


http://gerrit.cloudera.org:8080/#/c/19391/6/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@230
PS6, Line 230: respect
May need to clarify. Do you mean use?


http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py
File tests/custom_cluster/test_wide_table_operations.py:

http://gerrit.cloudera.org:8080/#/c/19391/6/tests/custom_cluster/test_wide_table_operations.py@71
PS6, Line 71: 10mins for 50k partitions
nit. Is this the time after the patch or before?



--
To view, visit http://gerrit.cloudera.org:8080/19391
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d
Gerrit-Change-Number: 19391
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 29 Dec 2022 15:24:56 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

2022-12-23 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
..


Patch Set 10: Code-Review+2

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 23 Dec 2022 15:32:47 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11786: Preserve memory for codegen cache

2022-12-22 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19377 )

Change subject: IMPALA-11786: Preserve memory for codegen cache
..


Patch Set 2: Code-Review+1

(1 comment)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19377/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/19377/1/be/src/runtime/exec-env.cc@337
PS1, Line 337: is_percent
> If the is_percent is true, the codegen_cache_capacity should be 0 because r
I see. Maybe add a DCHECK(!is_percent) here.



--
To view, visit http://gerrit.cloudera.org:8080/19377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebdc04ba1b91578d74684209a11c815225b8505a
Gerrit-Change-Number: 19377
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 22 Dec 2022 15:29:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

2022-12-20 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
..


Patch Set 9: Code-Review+1

Hi Wangsheng,

Sure. To separate the integrity check into another JIRA sounds good to me.


--
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 20 Dec 2022 14:55:03 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11786: Preserve memory for codegen cache

2022-12-20 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19377 )

Change subject: IMPALA-11786: Preserve memory for codegen cache
..


Patch Set 1:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/19377/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19377/1//COMMIT_MSG@15
PS1, Line 15: admit
nit. admission


http://gerrit.cloudera.org:8080/#/c/19377/1//COMMIT_MSG@16
PS1, Line 16: preserve
nit. preserving


http://gerrit.cloudera.org:8080/#/c/19377/1//COMMIT_MSG@17
PS1, Line 17: and treat
nit. by treating


http://gerrit.cloudera.org:8080/#/c/19377/1//COMMIT_MSG@17
PS1, Line 17: since
nit. from


http://gerrit.cloudera.org:8080/#/c/19377/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/19377/1/be/src/runtime/exec-env.cc@337
PS1, Line 337: is_percent
Do we need to check the value of is_percent after the call?


http://gerrit.cloudera.org:8080/#/c/19377/1/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19377/1/tests/custom_cluster/test_executor_groups.py@591
PS1, Line 591: 20%
It would be better to calculate the 3.2GB from 4Gb and 20%, so that one can 
specify 10% or 30% to fine tune the size of the codegen cache for a workload.



--
To view, visit http://gerrit.cloudera.org:8080/19377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebdc04ba1b91578d74684209a11c815225b8505a
Gerrit-Change-Number: 19377
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 20 Dec 2022 14:50:52 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE

2022-12-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19268 )

Change subject: IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE
..


Patch Set 10: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd95f2d18b3980581788c92993b6d2f53504b5e0
Gerrit-Change-Number: 19268
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 14 Dec 2022 02:18:12 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE

2022-12-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19268 )

Change subject: IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE
..


Patch Set 10:

(1 comment)

Thanks for the quick reply!

http://gerrit.cloudera.org:8080/#/c/19268/9/testdata/workloads/functional-query/queries/QueryTest/show.test
File testdata/workloads/functional-query/queries/QueryTest/show.test:

http://gerrit.cloudera.org:8080/#/c/19268/9/testdata/workloads/functional-query/queries/QueryTest/show.test@277
PS9, Line 277: $ERASURECODE_POLICY
> These run as both EC and non-EC. In non-EC runs the value will be NONE. Thi
OK. Make sense now.

May add a comment in the commit message on $ERASURECODE_POLICY (shown through 
the change set) and its semantics.



--
To view, visit http://gerrit.cloudera.org:8080/19268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd95f2d18b3980581788c92993b6d2f53504b5e0
Gerrit-Change-Number: 19268
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 14 Dec 2022 02:08:10 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE

2022-12-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19268 )

Change subject: IMPALA-9487: Add erasure coding policy to SHOW, DESCRIBE
..


Patch Set 9:

(2 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/19268/9/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/19268/9/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@208
PS9, Line 208: Unable to retrieve erasure coding policy
nit. "Unable to retrieve erasure coding policy for distributed file system 
. Return NONE".


http://gerrit.cloudera.org:8080/#/c/19268/9/testdata/workloads/functional-query/queries/QueryTest/show.test
File testdata/workloads/functional-query/queries/QueryTest/show.test:

http://gerrit.cloudera.org:8080/#/c/19268/9/testdata/workloads/functional-query/queries/QueryTest/show.test@277
PS9, Line 277: $ERASURECODE_POLICY
I notice this line

bin/impala-config.sh:export HDFS_ERASURECODE_POLICY="RS-3-2-1024k"

and just wonder if we should display the real policy"RS-3-2-1024k" instead.  
$ERASURECODE_POLICY seems very generic.



--
To view, visit http://gerrit.cloudera.org:8080/19268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd95f2d18b3980581788c92993b6d2f53504b5e0
Gerrit-Change-Number: 19268
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 14 Dec 2022 01:12:17 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10986: Require the SELECT privilege to execute a UDF

2022-12-12 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19194 )

Change subject: IMPALA-10986: Require the SELECT privilege to execute a UDF
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19194/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19194/15//COMMIT_MSG@7
PS15, Line 7: SELECT
> Thanks Qifan!
Got it. The Hive model on a UDF execution privilege is represented by SELECT 
and INSERT or REFRESH.

I am fine to your current design of following the Hive mode.


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
File fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java:

http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java@347
PS15, Line 347: dbName_ = path.get(0);
  :   functionName_.setDb(path.get(0));
  :   functionName_.setFunction(path.get(1));
> Hm, I am a bit confused here.
Good point. Sorry I misread the || operator.


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java@358
PS15, Line 358: // Need to set up 'dbName_', which in turn is used to set up 
'db_name' of the
  : // TPrivilege in createTPrivilege()
> Thanks Qifan!
Done


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@579
PS15, Line 579:   .error(accessError("functional"),
  :   onUdf("functional", "f", TPrivilegeLevel.SELECT))
> Thanks Qifan!
Got it. Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/19194
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e58ba30545ce169786aac279b00c8f6e09ae740
Gerrit-Change-Number: 19194
Gerrit-PatchSet: 15
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Dec 2022 13:24:28 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10986: Require the SELECT privilege to execute a UDF

2022-12-09 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19194 )

Change subject: IMPALA-10986: Require the SELECT privilege to execute a UDF
..


Patch Set 15:

(6 comments)

Just some minor comments.

The main concern is whether it is feasible to add a new privilege keyword 
EXECUTE or USE instead of reusing SELECT.

http://gerrit.cloudera.org:8080/#/c/19194/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19194/15//COMMIT_MSG@7
PS15, Line 7: SELECT
It seems the proper keyword should be EXECUTE or USE.  SELECT may imply the UDF 
can't insert a row.


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
File fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java:

http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java@347
PS15, Line 347: dbName_ = path.get(0);
  :   functionName_.setDb(path.get(0));
  :   functionName_.setFunction(path.get(1));
nit. May just use "*" at RHS or in arguments for readability.


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java@358
PS15, Line 358: // Need to set up 'dbName_', which in turn is used to set up 
'db_name' of the
  : // TPrivilege in createTPrivilege()
nit. Suggest to move to line 343 to cover both *.* and non *.* cases.


http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
File 
fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/19194/15/fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java@579
PS15, Line 579:   .error(accessError("functional"),
  :   onUdf("functional", "f", TPrivilegeLevel.SELECT))
should error() be ok() here, since functional.f is automatically SELECTable 
privilege-wise?


http://gerrit.cloudera.org:8080/#/c/19194/15/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/19194/15/tests/authorization/test_ranger.py@482
PS15, Line 482: a wildcard for functional name
nit. functional name in wildcard


http://gerrit.cloudera.org:8080/#/c/19194/15/tests/authorization/test_ranger.py@521
PS15, Line 521: # Revoke the granted privilege and verify.
  :   admin_client.execute("revoke select on user_defined_fn 
{0}.{1} from {2} {3}"
  :.format(unique_database, udf, kw, 
id))
  :   result = self.client.execute("show grant {0} {1} on 
user_defined_fn {2}.{3}"
  :.format(kw, id, 
unique_database, udf))
  :   TestRanger._check_privileges(result, [])
Repetition of code  from line 483 to 526.

May refactor these tests to use the pattern from line 522 to 526.



--
To view, visit http://gerrit.cloudera.org:8080/19194
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e58ba30545ce169786aac279b00c8f6e09ae740
Gerrit-Change-Number: 19194
Gerrit-PatchSet: 15
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 09 Dec 2022 17:46:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-12-05 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 16: Code-Review+1

(3 comments)

Thanks a lot!

http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@61
PS14, Line 61: e stored to the cac
> I think it is a good idea, and changed the optimal mode to use hash code pl
Nice!


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@113
PS14, Line 113: From the result, it shows
> I think the case mentioning here is to compare codegen disabled vs codegen
Make sense. Yeah maybe in the future we can amortize the cost of LLVM codegen 
over queries or make the cache content persistent.


http://gerrit.cloudera.org:8080/#/c/19181/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/16//COMMIT_MSG@139
PS16, Line 139: 1382.6
Yeah, the cache key size in NORMAL case is relatively large.  Maybe we could 
optimize it a bit in next around of work.  This also calls for a policy to 
eject less useful entries if not done yet.



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 16
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 05 Dec 2022 14:46:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11744: Table mask view should preserve the original column order in Hive

2022-12-05 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19279 )

Change subject: IMPALA-11744: Table mask view should preserve the original 
column order in Hive
..


Patch Set 4: Code-Review+2

+2 to carry Fang-Yu's and Daniel's +1.


--
To view, visit http://gerrit.cloudera.org:8080/19279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Gerrit-Change-Number: 19279
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 05 Dec 2022 14:31:03 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-12-01 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 14: Code-Review+1

(15 comments)

Some additional comments to the commit message.

http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@13
PS14, Line 13: save
nit avoid repeated


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@14
PS14, Line 14: .
It is also useful to mention whether the cache is per process or per backend.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@16
PS14, Line 16: in a fragment-level way
nit at the fragment level.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@18
PS14, Line 18: one
nit you mean another?


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@34
PS14, Line 34: Codegen cache is disabled if a fragment uses a native udf, 
because
nit automatically


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@41
PS14, Line 41: .
May file a JIRA and mention the JARA number here.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@43
PS14, Line 43: flags for start and query
 : options
nit query options for feature configuration and operation purposes.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@45
PS14, Line 45: start option
nit Configuration.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@48
PS14, Line 48:
 : query option
nit Operations


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@50
PS14, Line 50: - disable_codegen_cache
 : Codegen cache will be disabled when it is set to true.
nit suggest to indent as follows for readability.

query option:
   - disable_codegen_cache: codegen cache will be disabled when it is set to 
true.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@61
PS14, Line 61: store the hash code
I wonder if we can also store the length of the full key so that we can reject 
a good portion of collision cases when lengths differ.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@70
PS14, Line 70: impala.codegen-cache.misses
 : impala.codegen-cache.entries-in-use
 : impala.codegen-cache.entries-in-use-bytes
 : impala.codegen-cache.entries-evicted
 : impala.codegen-cache.hits
 : impala.codegen-cache.entry-sizes
nit. indent


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@78
PS14, Line 78: CodegenCacheLookupTime
 : CodegenCacheSaveTime
 : ModuleBitcodeGenTime
 : NumCachedFunctions
nit indent


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@86
PS14, Line 86: Delta(Avg)
It might be informational to add a column showing the cache overhead in bytes.


http://gerrit.cloudera.org:8080/#/c/19181/14//COMMIT_MSG@113
PS14, Line 113: cache is not always faster
When codegen is disabled, how can codegen cache be still useful?



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 14
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 01 Dec 2022 17:09:02 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-11-29 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 11: Code-Review+1

(1 comment)

Thanks!

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@162
PS8, Line 162: alue(), codegen->function_n
> Thanks Qifan.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 11
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 29 Nov 2022 15:46:35 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11744: Table mask view should reserve the original column order in Hive

2022-11-27 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19279 )

Change subject: IMPALA-11744: Table mask view should reserve the original 
column order in Hive
..


Patch Set 2: Code-Review+1

(1 comment)

Looks great!

http://gerrit.cloudera.org:8080/#/c/19279/1/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/19279/1/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@560
PS1, Line 560: Hive
> Sure, this is an override. Copied the comments from FeTable#getColumnsInHiv
Done



--
To view, visit http://gerrit.cloudera.org:8080/19279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Gerrit-Change-Number: 19279
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 27 Nov 2022 17:01:46 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11744: Table mask view should reserve the original column order in Hive

2022-11-25 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19279 )

Change subject: IMPALA-11744: Table mask view should reserve the original
column order in Hive
..

Patch Set 1:

(4 comments)

Looks very good. Just have some minor comments.

I also wonder if the scope of the problem is with column masking views only or
not. Can you please double check?

http://gerrit.cloudera.org:8080/#/c/19279/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19279/1//COMMIT_MSG@21
PS1, Line 21: tbl
nit. tbl;

http://gerrit.cloudera.org:8080/#/c/19279/1//COMMIT_MSG@22
PS1, Line 22: to be
nit. with

Ideally Impala should replace the TableRef of 'tbl' with a table mask view as:

http://gerrit.cloudera.org:8080/#/c/19279/1//COMMIT_MSG@31
PS1, Line 31: makes STAR be expanded
nit. This incorrectly expands the STAR as "b, a, c" ...

http://gerrit.cloudera.org:8080/#/c/19279/1/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/19279/1/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@560
PS1, Line 560: Hive
nit. May add a comment here to indicate the Hive order is the logically defined
order.

--
To view, visit http://gerrit.cloudera.org:8080/19279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Gerrit-Change-Number: 19279
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Qifan Chen
Gerrit-Comment-Date: Sat, 26 Nov 2022 03:27:00 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-11-25 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 10:

(2 comments)

Thanks a lot for the rework!

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@162
PS8, Line 162:
> Actually, we don't expect a different content for a specific key. It is sor
IMHO Impala should never allow wrong results at all.

One way to do so would be to clear the entry in the cache when there is a 
collision, if the reuse of the entry would lead to wrong results.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/krpc-data-stream-sender.h@176
PS9, Line 176: uint64_t seed
> There were some discussions in patch 3: https://gerrit.cloudera.org/#/c/191
Done



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 25 Nov 2022 23:37:51 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

2022-11-22 Thread Qifan Chen (Code Review)

Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19220 )

Change subject: IMPALA-11707: Fix global runtime IN-list filter of numeric 
types are AlwaysFalse
..

IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

Global runtime filters are published to the coordinator and then
distributed to all executors that need it. The filter is serialized and
deserialized using protobuf. While deserializing a global runtime filter
of numeric type from protobuf, the InsertBatch() method forgot to update
the total_entries_ counter. The filter is then considered as an empty
list, which will reject any files/rows.

This patch adds the missing update of total_entries_. Some DCHECKs are
added to make sure total_entries_ is consistent with the actual size of
the value set. This patch also fixes a type error (long_val -> int_val)
in ToProtobuf() of Date type IN-list filter.

Tests:
- Added BE tests to verify the filter cloned from protobuf has the same
  behavior as the original one.
- Added e2e regression tests
- Run TestInListFilters 200 times.

Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
Reviewed-on: http://gerrit.cloudera.org:8080/19220
Tested-by: Impala Public Jenkins 
Reviewed-by: Qifan Chen 
---
M be/src/runtime/runtime-filter-bank.h
M be/src/util/in-list-filter-ir.cc
M be/src/util/in-list-filter-test.cc
M be/src/util/in-list-filter.cc
M be/src/util/in-list-filter.h
M testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
6 files changed, 250 insertions(+), 78 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/19220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
Gerrit-Change-Number: 19220
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang

[Impala-ASF-CR] IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

2022-11-22 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19220 )

Change subject: IMPALA-11707: Fix global runtime IN-list filter of numeric 
types are AlwaysFalse
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
Gerrit-Change-Number: 19220
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 22 Nov 2022 15:21:06 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-11-18 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 9:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen-cache.h@38
PS9, Line 38: hash code
of sizeof(struct HashCode) (or two uint64_t)


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h@69
PS8, Line 69: d as the has
> Added comments. The structure is used for customized unordered set. It is a
Yeah, this is one way to customize a data structure. You may keep it but IMHO 
it is a complex way.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h@76
PS8, Line 76:
:   /// Used as the comparator function in unordered_set for struct 
HashCode.
:   struct HashCodeEqual {
: bool operator()(const HashCode& lhs, const HashCode& rhs) 
const {
:
> Added comments. Same as above.
The operator() here does the same as HashCode::operator==().  Code duplication 
smell.

Suggest to remove this struct and use the HashCode::operator==() instead.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@162
PS8, Line 162: override the existing entry
> For normal procedure, the caller would call a Lookup() before StoreInternal
It seems that there are two cases here when a hash collision takes place.

1). The two sources are not the same;
2). The two sources are the same.

Since we can not avoid collision, my comment above is for case 1).

To distinguish case 1 from case 2, we need to compare the original keys. 
Comparing hash code of the original keys may not be sufficient.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@171
PS8, Line 171: codegen_cache_entries_in_use_->Increment(1);
 :   codegen_cache_entries_in_use_bytes_->Increment(mem_charge);
> When the old entry with the same key is replaced, it will be evicted, so th
Done


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@185
PS8, Line 185: ler to avoid multiple handles fo
> Added comments. I think one case is that when we run two same queries at th
I see. Thanks for the example.

It may also be useful to compare the size of the entries to minimize the chance 
of hash collision due to different entries.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1130
PS9, Line 1130: suc
nit. entry_exist


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1156
PS9, Line 1156: teste
nit typo?


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1191
PS9, Line 1191: result << function->getName().data();
We may need to beef up the formation of function name list here since the 
following two cases would produce same list.

a) foobar()
b)  foo() and bar()

One way to do so would be to prepend the # of functions at the beginning of the 
list and use ',' to separate function names.

1,foobar
2,foo,bar


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/codegen/llvm-codegen.cc@1980
PS9, Line 1980: cache
nit. Store to codegen cache


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/agg-fn.cc
File be/src/exprs/agg-fn.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/exprs/agg-fn.cc@89
PS9, Line 89:  Codegen cache is disabled for udf functions.
nit. describe the reason here.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/exec-env.cc@459
PS9, Line 459: codegen_cache_capacity
nit. better to print like 2.0Gb, 250Mb. etc to make it simple.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/exec-env.cc@466
PS9, Line 466: initialed
nit . typo.


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/fragment-state.h
File be/src/runtime/fragment-state.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/fragment-state.h@231
PS9, Line 231: should be
nit is


http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/19181/9/be/src/runtime/krpc-data-stream-sender.h@176
PS9, Line 176: uint64_t seed
Why is this necessary WRT codegen cache?

[Impala-ASF-CR] IMPALA-11064 Optimizing Temporary File Structure for Batch Reading

2022-11-18 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18219 )

Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch 
Reading
..


Patch Set 10: Code-Review+1

(8 comments)

Thanks for the clarification!

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@52
PS9, Line 52: ew start option:
> Changed to remote_batch_read_block_size_level. Keep the level to indicate i
Done


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@63
PS9, Line 63: ew query options:
> The idea for the perfetch here is to set the step instead of the size to be
Done


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@74
PS9, Line 74:
> That is a good point. In fact remote_batch_buffer_file_limit limits the max
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@82
PS9, Line 82: int64_t level, uint64_t default_id
> I think in that case, multiple instances but with the same partition id cou
Assume uniform data distributions, then each of the  instances has the same 
chance of spill, or generates equal amount of spilled data.

When a skew exists, then one of the  instances may have high amount of 
spilled data.

I agree that it may be useful to run some more tests and find out a good 
strategy for each case.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@84
PS9, Line 84: 16;
> If my understanding is correct, the partition number here relies on the PAR
okay. Thanks for pointing out that the partition is actually the fanout. May be 
add a comment in comment message to clarify?

See my comment about about more testing.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc
File be/src/runtime/io/local-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc@66
PS9, Line 66: WriteOne
> The name has been there for a while, and the rename requires changes in qui
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@427
PS9, Line 427: int
> Uses int because it leaves a room for negative value to indicate special me
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@890
PS9, Line 890: gro
> changed to resource_val, because it may not be bytes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018
Gerrit-Change-Number: 18219
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 18 Nov 2022 20:13:21 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

2022-11-18 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19220 )

Change subject: IMPALA-11707: Fix global runtime IN-list filter of numeric 
types are AlwaysFalse
..


Patch Set 3: Code-Review+2

(2 comments)

+2 to carry Daniel's +1.

http://gerrit.cloudera.org:8080/#/c/19220/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19220/2//COMMIT_MSG@21
PS2, Line 21: Tests
> Just the test section of the commit message
Done


http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h
File be/src/util/in-list-filter.h:

http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h@164
PS2, Line 164: nd(S
> We need to use the same method name as it's used in IN_LIST_FILTER_INSERT_B
Done



--
To view, visit http://gerrit.cloudera.org:8080/19220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
Gerrit-Change-Number: 19220
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 18 Nov 2022 19:49:05 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11470: Add Cache For Codegen Functions

2022-11-17 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19181 )

Change subject: IMPALA-11470: Add Cache For Codegen Functions
..


Patch Set 8:

(12 comments)

Looks very good and the performance numbers are nice.

Will find some time to finish review the rest of the code.

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h
File be/src/codegen/llvm-codegen-cache.h:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h@69
PS8, Line 69: HashCodeHash
It seems this struct is used only for the operator() defined below. Can the 
operator() be defined in HashCode struct as getH1()?


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h@70
PS8, Line 70: size_t
Better to match the type for HashState.h1 (uint64_t).


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.h@76
PS8, Line 76:  struct HashCodeEqual {
: bool operator()(const HashCode& lhs, const HashCode& rhs) 
const {
:   return lhs.hash_code.h1 == rhs.hash_code.h1 && 
lhs.hash_code.h2 == rhs.hash_code.h2;
: }
:   };
same as the above comment for struct HashCodeHash.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc
File be/src/codegen/llvm-codegen-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@162
PS8, Line 162: override the existing entry
It is possible to use a strategy to retain a better entry than just taking the 
current one as the better one. An entry is better if it is more valuable in 
general.

Also, since we allow 128Mb of memory for the cache, the LRU may be a good 
strategy to keep most valuable entries in the cache.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@171
PS8, Line 171: codegen_cache_entries_in_use_->Increment(1);
 :   codegen_cache_entries_in_use_bytes_->Increment(mem_charge);
This math is not correct if the old entry is replaced by a new one in hash 
collision case.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@183
PS8, Line 183: to_insert_key_it
nit keys_to_insert_it


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@185
PS8, Line 185: avoid the simultaneous insertion
Can you please explain this logic maybe with some examples?


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen-cache.cc@188
PS8, Line 188: to_insert_keys_
nit. keys_to_insert


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen.h@194
PS8, Line 194: suc
nit. This may not sound like a good name.


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen.h@715
PS8, Line 715: avoid
nit. reduce


http://gerrit.cloudera.org:8080/#/c/19181/8/be/src/codegen/llvm-codegen.h@766
PS8, Line 766:   /// the codegen cache.
May need to briefly describe how such a name list is obtained, in particular 
the order of these function names in the string.


http://gerrit.cloudera.org:8080/#/c/19181/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19181/8/common/thrift/ImpalaService.thrift@747
PS8, Line 747:   // insert a full key for the cache and allow more statistics, 
otherwise, a hashcode
of mere 64 bytes



--
To view, visit http://gerrit.cloudera.org:8080/19181
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If42c78a7f51fd582e5fe331fead494dadf544eb1
Gerrit-Change-Number: 19181
Gerrit-PatchSet: 8
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 17 Nov 2022 19:27:56 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

2022-11-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19220 )

Change subject: IMPALA-11707: Fix global runtime IN-list filter of numeric 
types are AlwaysFalse
..


Patch Set 2:

(6 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19220/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19220/2//COMMIT_MSG@21
PS2, Line 21: Tests
nit. Core tests?


http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h
File be/src/util/in-list-filter.h:

http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h@143
PS2, Line 143: pair::iterator, bool>
nit may add a comment describing this pair.


http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h@144
PS2, Line 144: insert
Seems it can be replaced by emplace() to a) maintain consistency with the 
insert() at line 149, and b) keep only the unique values in "values".


http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.h@164
PS2, Line 164: size
nit. numOfStrings().


http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/19220/2/be/src/util/in-list-filter.cc@193
PS2, Line 193: int_val
good catch.


http://gerrit.cloudera.org:8080/#/c/19220/2/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/19220/2/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@179
PS2, Line 179:  RUNTIME_PROFILE
Looks like we could beef up the new tests a little bit to demonstrate that 
global filters are shipped to the scan and applied.



--
To view, visit http://gerrit.cloudera.org:8080/19220
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
Gerrit-Change-Number: 19220
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 15 Nov 2022 01:43:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem closed

2022-11-14 Thread Qifan Chen (Code Review)

Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19233 )

Change subject: IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem 
closed
..

IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem closed

IMPALA-11699 recently reverted the changes in FileMetadataLoaderTest
that it no longer extends to FrontendTestBase. This causes tests in
FileMetadataLoaderTest to become flaky if run individually.

This is a subtle bug in FileMetadataLoader#load(), where we first call
FileSystem#getFileSystem() to get a FileSystem object, and then call
methods of FileSystemUtil which will trigger class loading of it and
running its static code.

FileSystem#getFileSystem() will create a FileSystem object in the first
get and cache it for follow-up usage. Without extending
FrontendTestBase, BackendConfig.INSTANCE is not initialized when the
test is run. So the static code in FileSystemUtil lazily initializes
BackendConfig.INSTANCE, which initializes FeSupport and finally calls
JniUtil::InitLibhdfs(). In this method, a FileSystem object is get and
closed. This is exactly the FileSystem object created in
FileMetadataLoader#load(). So the following usage on it causes an
IOException of "Filesystem closed".

The purpose of JniUtil::InitLibhdfs() is to make a simple call on
libhdfs to make it initialize the JVM. This is crucial when launching
from C++ codes for impalad and catalogd to init the embedded JVM. So we
should keep it unchanged.

The fix for this bug is to maintain the state of BackendConfig.INSTANCE
in the static code of FileSystemUtil (i.e., keep un-initialized or
initialized), which avoids the subtle side effects.

Tests:
 - Verified tests in FileMetadataLoaderTest individually

Change-Id: Ib6f96950210c9a0124fe03696ef334cb00b057ab
Reviewed-on: http://gerrit.cloudera.org:8080/19233
Tested-by: Impala Public Jenkins 
Reviewed-by: Michael Smith 
Reviewed-by: Qifan Chen 
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
2 files changed, 21 insertions(+), 14 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Smith: Looks good to me, but someone else must approve
  Qifan Chen: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/19233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib6f96950210c9a0124fe03696ef334cb00b057ab
Gerrit-Change-Number: 19233
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang

[Impala-ASF-CR] IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem closed

2022-11-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19233 )

Change subject: IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem 
closed
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f96950210c9a0124fe03696ef334cb00b057ab
Gerrit-Change-Number: 19233
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 15 Nov 2022 01:10:40 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem closed

2022-11-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19233 )

Change subject: IMPALA-11720: Deflake FileMetadataLoaderTest due to FileSystem 
closed
..


Patch Set 2:

(5 comments)

Looks like a good fix!

http://gerrit.cloudera.org:8080/#/c/19233/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19233/2//COMMIT_MSG@13
PS2, Line 13: happens
nit.  remove.


http://gerrit.cloudera.org:8080/#/c/19233/2//COMMIT_MSG@24
PS2, Line 24: creates
nit. created?


http://gerrit.cloudera.org:8080/#/c/19233/2//COMMIT_MSG@30
PS2, Line 30: c
nit C++


http://gerrit.cloudera.org:8080/#/c/19233/2//COMMIT_MSG@33
PS2, Line 33: not initializing BackendConfig.INSTANCE in the
: static code of FileSystemUtil
nit. to maintain the state of BackendConfig.INSTANCE in the static code of 
FileSystemUtil (i.e., keep un-initialized or initialized).


http://gerrit.cloudera.org:8080/#/c/19233/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/19233/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@880
PS2, Line 880: BackendConfig
nit. use of BackendConfig.INSTANCE will make the message clear.



--
To view, visit http://gerrit.cloudera.org:8080/19233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6f96950210c9a0124fe03696ef334cb00b057ab
Gerrit-Change-Number: 19233
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 13 Nov 2022 15:37:47 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11721: Impala query keep being retried over frequently updated iceberg table

2022-11-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19234 )

Change subject: IMPALA-11721: Impala query keep being retried over frequently 
updated iceberg table
..


Patch Set 1: Code-Review+1

(1 comment)

Looks good.

One question on getting the meta-data from meta-provider cache. What happen 
when the cache is updated during the use phase of the cached data. Any work 
needed on the client side?

http://gerrit.cloudera.org:8080/#/c/19234/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/19234/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@122
PS1, Line 122: warmupCache
nit. May name it as warmupMetaProviderCache() to make it clear.



--
To view, visit http://gerrit.cloudera.org:8080/19234
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac28224b2b6d67725eeb17f3e9d813ba622edb43
Gerrit-Change-Number: 19234
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Sun, 13 Nov 2022 15:06:55 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11064 Optimizing Temporary File Structure for Batch Reading

2022-10-21 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18219 )

Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch 
Reading
..


Patch Set 9:

(29 comments)

Looks good!

It seems there is still room to simplify the new query options. That is the 
interface to the new temp file scheme.

That to be able to handle many many partitions is also a key.

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@12
PS9, Line 12: allocate
nit. allocating


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@12
PS9, Line 12: There will be two types of structures, one is the original, 
allocate
nit with the patch


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@52
PS9, Line 52: remote_batch_read_max_block_size_level
I wonder why not spell out the block size directly, as 
remote_batch_read_block_size. Simplicity is better.


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@63
PS9, Line 63: remote_batch_read_prefetch_step
A slight better name would be remote_batch_read_prefetch_size.

The option specifies the number of subsequent blocks to prefetch.


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@74
PS9, Line 74: base_spill_id_level
If I understand correctly, max number of temp files = 2^level times # of 
partitions. When there are many partitions, this number could still be large.

Would it be possible to provide an option on max number of temp files directly?


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@103
PS9, Line 103:
May need to say a few sentences with performance numbers here to indicate the 
new scheme is better.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@82
PS9, Line 82: int64_t level, uint64_t default_id
I wonder if fragment instance should play a role here. In the mt_dop > 0 case, 
multiple instances can be running on a single node. In this case, it may be 
useful to place spills from fragment instances on different temp files.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@84
PS9, Line 84: 16;
This limits max # of partitions to be 2^16 - 1, which could be very limiting 
for tables with range partitions on weeks or days.

Would it be possible to use the following existing code directly?  The inputs 
are a sequence of uints: default_id, partition_id, and possibly fragment 
instance id. And then mod on the max number of temp files (see my comment in 
commit message).


 
 91   /// CrcHash() specialized for 8-byte data 
 92   static inline uint32_t CrcHash8(const void* v, uint32_t hash) {
 93 DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2) || base::IsAarch64()); 
 
 94 const uint64_t* p = reinterpret_cast(v);
 95 hash = SSE4_crc32_u64(hash, *p);
 96 hash = (hash << 16) | (hash >> 16); 
   
 97 return hash;  
 98   }
util/hash-util.h


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@86
PS9, Line 86:  DCHECK_GE(level, 0);
nit. this probably is nor necessary because of line 83.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc@194
PS9, Line 194: base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId(
 :   state->query_options().base_spill_id_level, 
(uint64_t)this);
nit. This line of code probably should be part of the member initialization 
section at line 180.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc@225
PS9, Line 225:  base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId(
 :   state->query_options().base_spill_id_level, 
(uint64_t)this);
nit. same comment as above.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-node.cc@123
PS9, Line 123: base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId(
 :   state->query_options().base_spill_id_level, 
(uint64_t)this);
nit. Move to member initialization section.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc
File be/src/runtime/io/local-file-writer.cc:

[Impala-ASF-CR] IMPALA-11666: Modify the warning message when table statistics may be corrupt

2022-10-20 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19148 )

Change subject: IMPALA-11666: Modify the warning message when table statistics 
may be corrupt
..


Patch Set 4: Code-Review+1

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@335
PS2, Line 335: "The latter case does not necessarily imply the existence of 
corrupt \n" +
 :   "statistics when the corresponding tables are 
transactional
> > I wonder if it is possible to exclude transactional table names from the
I think you can file a new JIRA to improve the message on removing 
transactional tables if the current fix is urgently needed.


http://gerrit.cloudera.org:8080/#/c/19148/4/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19148/4/fe/src/main/java/org/apache/impala/planner/Planner.java@332
PS4, Line 332: number of
nit. remove



--
To view, visit http://gerrit.cloudera.org:8080/19148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I404b00db97f1fa0f6e67995c6e85a124ccf242ef
Gerrit-Change-Number: 19148
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 20 Oct 2022 18:15:33 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11666: Modify the warning message when table statistics may be corrupt

2022-10-18 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19148 )

Change subject: IMPALA-11666: Modify the warning message when table statistics 
may be corrupt
..


Patch Set 2:

(5 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@332
PS2, Line 332: number of rows
nit row count


http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@333
PS2, Line 333: ii)
nit b)


http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@333
PS2, Line 333: i
nit a)


http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@333
PS2, Line 333: corresponding
nit remove


http://gerrit.cloudera.org:8080/#/c/19148/2/fe/src/main/java/org/apache/impala/planner/Planner.java@335
PS2, Line 335: "The latter case does not necessarily imply the existence of 
corrupt \n" +
 :   "statistics when the corresponding tables are 
transactional
I wonder if it is possible to exclude transactional table names from the above 
list.

In this way, we can report the warning message for tables with corrupted stats 
only.



--
To view, visit http://gerrit.cloudera.org:8080/19148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I404b00db97f1fa0f6e67995c6e85a124ccf242ef
Gerrit-Change-Number: 19148
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Oct 2022 14:32:17 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11536: fix invalid predicates propagate for outer join simplification

2022-10-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19116 )

Change subject: IMPALA-11536: fix invalid predicates propagate for outer join 
simplification
..


Patch Set 6:

(12 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19116/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19116/6//COMMIT_MSG@11
PS6, Line 11: then
nit and then remove


http://gerrit.cloudera.org:8080/#/c/19116/6//COMMIT_MSG@14
PS6, Line 14: This may lead to incorrect results since it
: is incorrect to propagate predicates into a plan subtree that is 
on the
: nullable side of an outer join if the predicate is not 
null-rejecting
nit. Suggest to reword to:

This may lead to incorrect results since it
is incorrect to propagate a non null-rejecting predicate into a plan subtree 
that is on the nullable side of an outer join.


http://gerrit.cloudera.org:8080/#/c/19116/6//COMMIT_MSG@17
PS6, Line 17: and determine whether it is a nullable side of an outer join 
according
: to GlobalState#outerJoinedTupleIds
nit. Suggest to reword to

GlobalState#outerJoinedTupleIds indicates whether a table is on the nullable 
side of an outer join.


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4093
PS6, Line 4093: nonnullableTids
nit. Suggest to rename to nullRejectingTids to make the code more readable.

Maybe also a good idea to add a comment about this input argument.


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4119
PS6, Line 4119: getNonnullableTids(ids, nonnullableTids);
May need to add a comment for this line of code like: find out all 
null-rejecting TupleIds in "ids".


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4120
PS6, Line 4120: if (TupleId.intersect(ids, nonnullableTids) || 
hasNullRejectingConjucts(ids))
If getNonnullableTids(ids, nonnullableTids) at line 4119 does find a subset in 
'ids' that are null-rejecting, then this line becomes un-necessary.


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4166
PS6, Line 4166: outer joined
nit. We probably should not say "outer joined" here since they are just 
arguments.


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4169
PS6, Line 4169: nonnullableTids
nit. nullRejectingTids


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4169
PS6, Line 4169: getNonnullableTids
nit. maybe gatherNullRejectingTids().


http://gerrit.cloudera.org:8080/#/c/19116/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@4190
PS6, Line 4190: LOG.warn("Skipping to check whether the conjunct is 
null-rejecting because"
  :   + "backend evaluation failed: " + e.toSql(), ex);
nit. The following probably is a more straightforward statement.

 LOG.warn("Fail to verify " + e.toSql + " being null-rejecting because of the 
backend evaluation failure", ex);


http://gerrit.cloudera.org:8080/#/c/19116/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test:

http://gerrit.cloudera.org:8080/#/c/19116/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test@1030
PS6, Line 1030: FROM functional.jointbl
  : FULL OUTER JOIN
  : functional.testtbl
  : ON jointbl.test_id = testtbl.id
  : FULL OUTER JOIN
These new tests with 3-way joins look good. I wonder if we can also add two 
more test queries that are 2-way out joins.


http://gerrit.cloudera.org:8080/#/c/19116/6/testdata/workloads/functional-planner/queries/PlannerTest/outer-to-inner-joins.test@1065
PS6, Line 1065: correcttly
typo. double 't'.



--
To view, visit http://gerrit.cloudera.org:8080/19116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6565c5bff0d2f24f30118ba47a2583383e83fff7
Gerrit-Change-Number: 19116
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Fri, 14 Oct 2022 15:05:56 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10715: test decimal min max filters failed in exhaustive run

2022-10-13 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19132 )

Change subject: IMPALA-10715: test decimal min max filters failed in exhaustive 
run
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19132/1/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/19132/1/tests/query_test/test_runtime_filters.py@234
PS1, Line 234: if self.exploration_strategy() != 'exhaustive':
> Can we put this into the test vector like on for example L152?
Done



--
To view, visit http://gerrit.cloudera.org:8080/19132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I20da28f780a27c6fdd917116e7c14d46d2a5db0f
Gerrit-Change-Number: 19132
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Oct 2022 18:26:45 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10715: test decimal min max filters failed in exhaustive run

2022-10-13 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19132 )

Change subject: IMPALA-10715: test decimal min max filters failed in exhaustive 
run
..

IMPALA-10715: test decimal min max filters failed in exhaustive run

This patch enables only the min/max filters in decimal min/max filter
test so that some of the non-qualifying rows can be returned from the
kudu scanners. Previously, the tests allows bloom filters to filter
out rows at the kudu scanner level which prevents non-qualfying rows
to arrive at the hash join node. Such non-qualifying rows are required
by the test.

With the patch, the test passes in the exhaustive mode. The patch also
refactors the above logic for the entire TestMinMaxFilters test so that
each test case in it will only get the min/max filter.

Testing:
 - Unit test
 - Core test

Change-Id: I20da28f780a27c6fdd917116e7c14d46d2a5db0f
---
M tests/query_test/test_runtime_filters.py
1 file changed, 4 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19132/3
--
To view, visit http://gerrit.cloudera.org:8080/19132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I20da28f780a27c6fdd917116e7c14d46d2a5db0f
Gerrit-Change-Number: 19132
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-11617: Pool service should be made aware of processing cost limit

2022-10-12 Thread Qifan Chen (Code Review)

Qifan Chen has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19121 )

Change subject: IMPALA-11617: Pool service should be made aware of processing 
cost limit
..

IMPALA-11617: Pool service should be made aware of processing cost limit

IMPALA-11604 enables the planner to compute CPU usage for certain
queries and to select suitable executor groups to run. The CPU usage is
expressed as the total amount of data to be processed per query.

This patch added the processing cost limit, which is the total amount of
data that each executor group can handle, to the pool service.

Testing:
 - Passed core run.
 - Verified that processing costs were shown on the admission and
   metrics pages of the Impala debug web server.

Change-Id: I9bd2a7284eda47a969ef91e4be19f96d2af53449
Reviewed-on: http://gerrit.cloudera.org:8080/19121
Reviewed-by: Qifan Chen 
Tested-by: Impala Public Jenkins 
---
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/metrics.json
M docs/topics/impala_admission_config.xml
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/RequestPoolService.java
M fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
M fe/src/test/resources/llama-site-2-groups.xml
M fe/src/test/resources/llama-site-test.xml
M fe/src/test/resources/llama-site-test2.xml
M fe/src/test/resources/mem-limit-test-llama-site.xml
M fe/src/test/resources/minicluster-llama-site.xml
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_executor_groups.py
M www/admission_controller.tmpl
16 files changed, 106 insertions(+), 10 deletions(-)

Approvals:
  Qifan Chen: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/19121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9bd2a7284eda47a969ef91e4be19f96d2af53449
Gerrit-Change-Number: 19121
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou

[Impala-ASF-CR] IMPALA-11617: Pool service should be made aware of processing cost limit

2022-10-12 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19121 )

Change subject: IMPALA-11617: Pool service should be made aware of processing 
cost limit
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bd2a7284eda47a969ef91e4be19f96d2af53449
Gerrit-Change-Number: 19121
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 12 Oct 2022 20:52:07 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10715: test decimal min max filters failed in exhaustive run

2022-10-12 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19132


Change subject: IMPALA-10715: test decimal min max filters failed in exhaustive 
run
..

IMPALA-10715: test decimal min max filters failed in exhaustive run

This patch disables the bloom filters in decimal min/max filter test
so that some of the non-qualifying rows can be returned from the kudu
scanners. The test specifically looks for these non-qualfying rows in
the hash join node

With the patch, the test passes in the exhaustive mode.

Testing:
 - Unit test
 - Core test

Change-Id: I20da28f780a27c6fdd917116e7c14d46d2a5db0f
---
M tests/query_test/test_runtime_filters.py
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19132/1
--
To view, visit http://gerrit.cloudera.org:8080/19132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I20da28f780a27c6fdd917116e7c14d46d2a5db0f
Gerrit-Change-Number: 19132
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zoltan Borok-Nagy

[Impala-ASF-CR] IMPALA-11617: Pool service should be made aware of processing cost limit

2022-10-11 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19121 )

Change subject: IMPALA-11617: Pool service should be made aware of processing 
cost limit
..


Patch Set 2:

(2 comments)

Looks very good!

http://gerrit.cloudera.org:8080/#/c/19121/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19121/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1840
PS2, Line 1840: x_query_mem_limit() : Long.MAX_VALUE);
  : //
A very good hint.


http://gerrit.cloudera.org:8080/#/c/19121/2/fe/src/main/java/org/apache/impala/util/RequestPoolService.java
File fe/src/main/java/org/apache/impala/util/RequestPoolService.java:

http://gerrit.cloudera.org:8080/#/c/19121/2/fe/src/main/java/org/apache/impala/util/RequestPoolService.java@396
PS2, Line 396: 0L
Can we set it to Long.MAX_VALUE by default?  In this way, a value of 0 can 
effectively shut down the executor group (say for maintenance etc).



--
To view, visit http://gerrit.cloudera.org:8080/19121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bd2a7284eda47a969ef91e4be19f96d2af53449
Gerrit-Change-Number: 19121
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Tue, 11 Oct 2022 13:54:29 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10889: Allow extra 1MB for fragment cancellation

2022-10-10 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19092 )

Change subject: IMPALA-10889: Allow extra 1MB for fragment cancellation
..


Patch Set 4: Code-Review+2

+2 to carry Abhishek and Daniel's +1


--
To view, visit http://gerrit.cloudera.org:8080/19092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaee557ad87d3926589b30d6dcdd850e9af9b3476
Gerrit-Change-Number: 19092
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 10 Oct 2022 19:13:23 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10889: Allow extra 1MB for fragment cancellation

2022-10-10 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19092 )

Change subject: IMPALA-10889: Allow extra 1MB for fragment cancellation
..

Patch Set 1: Code-Review+1

(2 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/19092/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19092/1//COMMIT_MSG@11
PS1, Line 11: In this test case, the extra reserved memory after
: cancellation is small (<1MB), as seen in logs printing
: agg_mem_reserved. Previously the test and query limits were
exactly
: matched, so any extra reservation would prevent scheduling,
causing the
: test to frequently time out.
nit. Suggest to reword a bit to state the cause first and then the solution.

"The test and query limits were exactly matched, so any extra reservation would
prevent scheduling, causing the test to frequently time out. With the fix, a
1MB of extra memory is reserved to break the tie thus avoiding the time out.
The extra 1MB of memory can be seen in logs printing
agg_mem_reserved."

http://gerrit.cloudera.org:8080/#/c/19092/1/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19092/1/tests/custom_cluster/test_executor_groups.py@566
PS1, Line 566: 4gb mem_limit
nit. May update to reflect the change.

--
To view, visit http://gerrit.cloudera.org:8080/19092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaee557ad87d3926589b30d6dcdd850e9af9b3476
Gerrit-Change-Number: 19092
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith
Gerrit-Reviewer: Abhishek Rawat
Gerrit-Reviewer: Daniel Becker
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Smith
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Quanlong Huang
Gerrit-Comment-Date: Mon, 10 Oct 2022 13:50:16 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-07 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 29:

Added a new flavor of expression cost computation: 
computeExistingExprsTotalCost(), which is needed for the other join predicates 
in Hash and Nested join.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 07 Oct 2022 17:37:02 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-07 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the processing cost along a path with the largest sum of processing
costs, reflecting the best level of parallelism among fragments.
The latter is the sum of that of every fragment, reflecting the worst
level of parallelism. The geometric mean of the minimal and the
maximal is used as the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

The processing cost for the data sink of a fragment is also computed
and added to that of the fragment.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

As an example, the best and worst processing cost for TPCDS large
query q14a and tiny q19 are as follows.
Processing cost for query:

q14a:
  Best case: total=1271804127, numInstances=9, perInstance=141311569
  Worst case: total=5169193752, numInstances=12, perInstance=430766146

q19:
  Best case: total=1082950, numInstances=15, perInstance=72196
  Worst case: total=1082950, numInstances=15, perInstance=72196

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-07 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#28). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the processing cost along a path with the largest sum of processing
costs, reflecting the best level of parallelism among fragments.
The latter is the sum of that of every fragment, reflecting the worst
level of parallelism. The geometric mean of the minimal and the
maximal is used as the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

The processing cost for the data sink of a fragment is also computed
and added to that of the fragment.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

As an example, the best and worst processing cost for TPCDS large
query q14a and tiny q19 are as follows.
Processing cost for query:

q14a:
  Best case: total=1271804127, numInstances=9, perInstance=141311569
  Worst case: total=5169193752, numInstances=12, perInstance=430766146

q19:
  Best case: total=1082950, numInstances=15, perInstance=72196
  Worst case: total=1082950, numInstances=15, perInstance=72196

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-07 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 26:

The processing cost is part of the design. Seems the revised design is a new 
feature, which probably should be done in a separate JIRA.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 07 Oct 2022 14:12:52 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-07 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 26:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19033/26/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/26/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@280
PS26, Line 280: totalProcessedCost
> rename the variable as totalProcessingCost to keep consistent.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 07 Oct 2022 13:31:46 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 26:

Adjust the minimal processing cost to be the largest sum of the processing 
costs along a path from the root fragment to a leaf fragment.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 06 Oct 2022 17:11:56 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-06 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the processing cost along a path with the largest sum of processing
costs, reflecting the best level of parallelism among fragments.
The latter is the sum of that of every fragment, reflecting the worst
level of parallelism. The geometric mean of the minimal and the
maximal is used as the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

The processing cost for the data sink of a fragment is also computed
and added to that of the fragment.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-05 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 25:

(2 comments)

Address review comments.

Also split the computation of processing cost for exchange into two parts:

1. In exchange node: only for the receiving part;
2. In DateStreamSink: only for the sending part.

This is because the sending part of the exchange node is embedded in the data 
stream sink of the bottom fragment; and the receiving part of the exchange is 
embedded in the top fragment.

http://gerrit.cloudera.org:8080/#/c/19033/23/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/23/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@42
PS23, Line 42: import org.apache.impala.util.BitUtil;
 : import org.apache.impala.util.ExprUtil;
> nit: switch order of these two lines
Done


http://gerrit.cloudera.org:8080/#/c/19033/23/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/23/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@29
PS23, Line 29: import org.apache.impala.util.AutoScaleUtil;
 : import org.apache.impala.util.ExprUtil;
> nit: switch order of these two lines
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 05 Oct 2022 22:29:17 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-05 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the max of that of every fragment in the query reflecting the best
level of parallelism among fragments. The latter is the sum of that
of every fragment reflecting the worst level of parallelism. The
geometric mean of the minimal and the maximal is used as the
estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

The processing cost for the data sink of a fragment is also computed
and added to that of the fragment.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-05 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 23:

(5 comments)

Address review comments and added the code to compute processing cost for 
different flavors of data sinks.

http://gerrit.cloudera.org:8080/#/c/19033/16/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/16/common/thrift/Frontend.thrift@740
PS16, Line 740: select
> Not updated
Sorry. Done this time :-).


http://gerrit.cloudera.org:8080/#/c/19033/16/common/thrift/Frontend.thrift@750
PS16, Line 750:  of data in by
> Yes. The comment is updated.
Removed 'per instance'. See class ProcessingCost.


http://gerrit.cloudera.org:8080/#/c/19033/22/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/22/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@299
PS22, Line 299: InstanceResourceProfile_ =
> setProcessingCost() is not called for perInstanceResourceProfile_
Removed.


http://gerrit.cloudera.org:8080/#/c/19033/22/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

http://gerrit.cloudera.org:8080/#/c/19033/22/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java@199
PS22, Line 199: processingCost_.multiply(factor));
  :   }
  :
> getProcessingCost().multiply(factor)
Great catch.

Done.


http://gerrit.cloudera.org:8080/#/c/19033/16/fe/src/main/java/org/apache/impala/util/AutoScaleUtil.java
File fe/src/main/java/org/apache/impala/util/AutoScaleUtil.java:

http://gerrit.cloudera.org:8080/#/c/19033/16/fe/src/main/java/org/apache/impala/util/AutoScaleUtil.java@39
PS16, Line 39:
 :   public static ProcessingCost 
computeAndLogProcessingCost(String label, long cardinality,
 :   float exprsCost, float avgRowSize, int numInstances) {
 : ProcessingCost processingCost =
 : computeProcessingCost(cardinality, exprsCost, 
avgRowSize, numInstances);
> If computeProcessingCost() is called before calling getProcessingCostComput
Refactor a bit.

Done.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 05 Oct 2022 15:00:42 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-05 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#23). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the max of that of every fragment in the query reflecting the best
level of parallelism among fragments. The latter is the sum of that
of every fragment reflecting the worst level of parallelism. The
geometric mean of the minimal and the maximal is used as the
estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

The processing cost for the data sink of a fragment is also computed
and added to that of the fragment.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 22:

Fix two bugs: in compute the number of instances for scans, and the number of 
receivers for broadcast exchange.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 05 Oct 2022 00:35:19 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the maximal amount. The former is
the max of that of every fragment in the query reflecting the best
level of parallelism among fragments. The latter is the sum of that
of every fragment reflecting the worst level of parallelism. The
geometric mean of the minimal and the maximal is used as the
estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the max amount. The former is the
sum of that of every fragment in the query reflecting the worst level
of parallel execution among fragments. The latter is the minimal of
that of every fragment reflecting the best level of parallel
execution. The geometric mean of the best and the worst is used as
the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC(I0 * C(equi-join predicate) * W,  N)  +
PC(output cardinality * C(other join predicate) * W, N)

build side = PC(I1 * C(equi-join predicate) * W, N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W, N)  +
  PC(output cardinality * C(other join predicate) * W, N)
  build side = PC(I1 * C(equi-join predicate) * W, N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * I1 * W, N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the max amount. The former is the
sum of that of every fragment in the query reflecting the worst level
of parallel execution among fragments. The latter is the minimal of
that of every fragment reflecting the best level of parallel
execution. The geometric mean of the best and the worst is used as
the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC((I0 * C(equi-join predicate) * W),  N)  +
PC(output cardinality * C(other join predicate) * W), N)

build side = PC((I1 * C(equi-join predicate) * W), N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W), N)  +
  PC((output cardinality * C(other join predicate) * W), N)
  build side = PC((I1 * C(equi-join predicate) * W), N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * ((I1 * W), N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
scan.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the max amount. The former is the
sum of that of every fragment in the query reflecting the worst level
of parallel execution among fragments. The latter is the minimal of
that of every fragment reflecting the best level of parallel
execution. The geometric mean of the best and the worst is used as
the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  Processing cost is a pair: PC(D, N), where D = I * C * W

  where D is the weighted amount of data processed
N is number of instances
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = PC((I0 * C(equi-join predicate) * W),  N)  +
PC(output cardinality * C(other join predicate) * W), N)

build side = PC((I1 * C(equi-join predicate) * W), N)

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = PC(I0 * C(equi-join predicate) * W), N)  +
  PC((output cardinality * C(other join predicate) * W), N)
  build side = PC((I1 * C(equi-join predicate) * W), N)

When the right child is a SingularRowSrc node:

  probe side = PC(I0 * W, N)
  build side = PC(I0 * ((I1 * W), N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
aggregation.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 18:

Remove reporting of best case for a fragment which does not make sense.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 04 Oct 2022 18:06:54 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2022-10-04 Thread Qifan Chen (Code Review)

Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow weighted total amount of data to process to be used as a
new factor in the definition and selection of an executor group. In
this patch the weight component is set to 1.

Two flavors of the weighted amount of data processed are enabled in
this patch: the minimal amount and the max amount. The former is the
sum of that of every fragment in the query reflecting the worst level
of parallel execution among fragments. The latter is the minimal of
that of every fragment reflecting the best level of parallel
execution. The geometric mean of the best and the worst is used as
the estimated processing cost of the query.

A fragment's total amount of data processed is the sum of that of
every node in the fragment. The weighted amount of data processed
is computed with a general formula as follows.

  D = (I * C * W) / N

  where D is the weighted amount of data processed
I is input cardinality
C is expression evaluation cost per row, set to 1
W is average row size
N is number of instances

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
C and W are the sum of the costs and partial row widths for each
AggregateInfo object.

2. AnalyticEval node:
C is sum of the evaluation costs for analytic functions, partition
by equal and order by equal predicate;

3. CardinalityCheck node:
Both C and I are 1;

4. DataSource scan node:
C is computed from a subset of the selection predicates excluding
data source accepted predicates;

5. EmptySet node:
I is 0;

6. Exchange node:
A modification of the general formula when in broadcast mode:
D = D * number of receivers;

7. Hash join node:
probe side = (I0 * C(equi-join predicate) * W) / N  +
(output cardinality * C(other join predicate) * W) / N

build side = (I1 * C(equi-join predicate) * W) / N

8. Hbase scan node:
N is 1

9. Hdfs and Kudu scan node:
N is mt_dop when query option mt_dop >= 1, otherwise
N is number of nodes * max scan threads;

10. Nested loop join node:
When the right child is not a SingularRowSrc node:

  probe side = (I0 * C(equi-join predicate) * W) / N  +
  (output cardinality * C(other join predicate) * W) / N
  build side = (I1 * C(equi-join predicate) * W) / N

When the right child is a SingularRowSrc node:

  probe side = (I0 * W) / N
  build side = I0 * ((I1 * W) / N)

11. Select node:
Use the general formula;

12. SingularRowSrc node:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. Sort node:
C is the evalation cost for the sort expression and W is the width
of the intermediate tuple being sorted;

14. Subplan node:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of materializing rows from all non pass-through
children. W is the width of all non pass-through children;

16. Unnest node:
I is the cardinality of the containing subplan node and C is 1.

This patch also assumes that the number of instances of execution
overlaps each other when there is a discrepancy of instances among
nodes in a single fragment. For example when there are 6 scan thread
and 3 aggregation threads in a single fragment, 3 threads are used
for both scan and aggregate, and 3 other threads are used for the
aggregation.

Testing:
  1. Unit testing by examining the best and the worst processing cost
 computed for all plan nodes in all fragments for a small set of
 queries;
  2. Core tests.

Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1254 matches

Mail list logo