[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning

2023-08-22 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile even if the 
query failed in planning
..

IMPALA-11898: Add query options in the profile even if the query failed in 
planning

Currently, query options are added to profile in
ClientRequestState::Exec() which is not executed if the query failed
in planning or not admitted
(e.g. timeout in queueing or cancelled before execution).
This patch moves the logics to where the query options are ready
to be added. To be specifit, "Query Options (set by configuration)"
is there when the client submits the request, so we add it in the
constructor of ClientRequestState.
"Query Options (set by configuration and planner)" is ready
when planning finishes.
So it's moved to right after the call of RunFrontendPlanner().

Testing:
- Run the query with AnalysisException.
- Added test to make sure "Impala Query State" is populated.

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M tests/query_test/test_observability.py
4 files changed, 23 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/8
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning

2023-08-03 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile even if the 
query failed in planning
..


Patch Set 6:

(4 comments)

Hi Quanlong and Daniel,
I'm back. I have solved some problems, thanks very much for your review.

http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG@7
PS3, Line 7: even
> "even" ?
Done


http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG@13
PS3, Line 13: This patch moves the logics to where the query options are ready
> Please mention the cause of the bug and how the query options are added. E.
Done


http://gerrit.cloudera.org:8080/#/c/19517/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19517/4//COMMIT_MSG@7
PS4, Line 7: even
> Nit: even, not event.
Done


http://gerrit.cloudera.org:8080/#/c/19517/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/19517/3/be/src/service/impala-server.cc@1247
PS3, Line 1247:
> Plase add a comment, e.g. "Add profile info items that are ready after RunF
Done



--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 04 Aug 2023 06:48:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning

2023-08-03 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile even if the 
query failed in planning
..

IMPALA-11898: Add query options in the profile even if the query failed in 
planning

Currently, query options are added to profile in
ClientRequestState::Exec() which is not executed if the query failed
in planning or not admitted
(e.g. timeout in queueing or cancelled before execution).
This patch moves the logics to where the query options are ready
to be added. To be specifit, "Query Options (set by configuration)"
is there when the client submits the request, so we add it in the
constructor of ClientRequestState.
"Query Options (set by configuration and planner)" is ready
when planning finishes.
So it's moved to right after the call of RunFrontendPlanner().

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
M tests/query_test/test_observability.py
4 files changed, 23 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/6
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2023-06-25 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
13 files changed, 1,448 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/18
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile event if the query failed in planning

2023-06-25 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile event if the 
query failed in planning
..

IMPALA-11898: Add query options in the profile event if the query failed in 
planning

Query options are normally included in the profile,
but when the query fails during planning, query options are missing.
After this change, query options are also added to the profile
upon planning failure.

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
2 files changed, 16 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/4
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10861: Optimize the plan for identical predicates

2023-03-20 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19511 )

Change subject: IMPALA-10861: Optimize the plan for identical predicates
..


Patch Set 3:

(7 comments)

> Patch Set 2:
>
> (7 comments)
>
> Thanks Baike for the fix.

Hi Yida,
Thanks for your advice and reply.
I made some fixes in response to your suggestions.
Looking forward to your reply and CR.

http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9
PS2, Line 9: For the query with two same predicates, duplicated data is deleted
> Would be good to elaborate the comment on what the current issue is and how
OK, I'm going to optimize this description.


http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9
PS2, Line 9: the
> nit. the?
Done


http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9
PS2, Line 9: duplicate
> nit. duplicated
Done


http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@12
PS2, Line 12: ing.
> Would it also work for below cases?
Done


http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@12
PS2, Line 12:
> should be a.id = b.id?
Done


http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/joins.test:

http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3118
PS2, Line 3118: the
> nit. the?
Done


http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3118
PS2, Line 3118: duplicate
> nit. duplicated
Done



--
To view, visit http://gerrit.cloudera.org:8080/19511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia249c8146215fad602e9310bf922c6bfa050b96b
Gerrit-Change-Number: 19511
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Mon, 20 Mar 2023 17:30:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2023-03-01 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table.
This patch try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests
- Added fe tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 194 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/16
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2023-02-28 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..


Patch Set 15:

Hi Quanlong and Penglin,
I corrected the suggestion and resolved the conflict.
Could you please CR again?


--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 28 Feb 2023 12:00:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2023-02-28 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 162 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/15
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile event if the query failed in planning

2023-02-28 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile event if the 
query failed in planning
..

IMPALA-11898: Add query options in the profile event if the query failed in 
planning

Query options are normally included in the profile,
but when the query fails during planning, query options are missing.
After this change, query options are also added to the profile
upon planning failure.

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
2 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/3
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-28 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 14:

Hi Csaba, I added some tests as you suggested earlier.
I added be/src/util/hash-util-test.cc to test if hive and impala hash methods 
have the same results.A test case for the non-partitioned table order has also 
been added.
Thanks again for your reply and suggestions.


--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 28 Feb 2023 10:03:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-28 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 14:

> Patch Set 13:
>
> > Patch Set 13:
> >
> > (1 comment)
> >
> > > Patch Set 13:
> > >
> > > (1 comment)
>
> The problem is that it would not be practical to check the block locations 
> for for potential relocations when doing the query planning.  Given N blocks 
> in a bucket for one table and M blocks for the second table, it would be 
> O(N+M) time to decide which distribution method to use. This would add up 
> depending on the number of joins in the query. We really want to 'pin' the 
> location but AFAIK HDFS does not allow us to do that. Other systems such as 
> MemSQL that do bucket join don't have to worry about this since the data is 
> memory resident.

Given N blocks in a bucket for one table and M blocks for the second table, we 
can use the least common divisor of N and M as the number of buckets for the 
two tables temporarily.
I'm not sure I understand what you mean.


--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 28 Feb 2023 09:59:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning

2023-02-24 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile if the query 
failed in planning
..


Patch Set 2:

(3 comments)

Hi Daniel,
Thank you for your reply and suggestions.

http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@7
PS1, Line 7: i
> Nit: unnecessary double space.
Done


http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@9
PS1, Line 9: Query options should usually be in the profile,
> Could you clarify the commit message?
Yeah, just like the jira title, query options should usually be in the profile, 
but when the Query fails during planning, Query options are missing.

The RunFrontendPlanner method is used for planning. Failure here should add 
query options. e.g.:
Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala 
shell build version not available.

If the query fails after planning we should list the query options set by 
configuration and the planner. e.g.:
Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala 
shell build version not available
Query Options (set by configuration and planner): 
MT_DOP=0,TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not 
available,MINMAX_FILTER_THRESHOLD=0.5,MINMAX_FILTERING_LEVEL=PAGE


http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@13
PS1, Line 13: Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
> Can we add tests that verify that the query options are included in the pro
Sorry, I can't. I tried to add the relevant tests, but couldn't find where to 
add them. Can you tell me?



--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 2
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Feb 2023 08:44:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning

2023-02-24 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/19517 )

Change subject: IMPALA-11898: Add query options in the profile if the query 
failed in planning
..

IMPALA-11898: Add query options in the profile if the query failed in planning

Query options should usually be in the profile,
but when the Query fails during planning, Query options are missing.
Upon failure, should add query options to the profile.

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
2 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/2
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 2
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-23 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
A be/src/util/hash-util-test.cc
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
53 files changed, 2,466 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/14
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-21 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 13:

(1 comment)

> Patch Set 13:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13
PS9, Line 13:
> I think my concern is similar to Aman's.
Yes, you are right to be concerned. As you said, the remote read cost caused by 
bucket shuffle join is higher than the cost of shuffle. I think this is an 
optimization, a CBO rule. The current version does not allow for similar 
optimizations and degradations, which I think will happen in the future.



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 22 Feb 2023 03:51:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-21 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 13:

(6 comments)

Hi Csaba And Aman,
I was busy with some things at work some time ago, so I didn't have much time 
to deal with the reply, and I'm so sorry.
Now I'm back. Look forward to your reply and suggestions.

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10
PS9, Line 10: performance for some Join queries. Th
> I still don't get the non-partitoned sort case. Can you give an example que
Yes, I'll add it later.


http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13
PS9, Line 13:
> Is there a node where the whole bucket is located? I mean that if there are
I don't think I understand what you mean, Can you explain that again?


http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG@25
PS13, Line 25: based on hdfs storage are supported.
> Thanks for the detailed patch.  I have a high level question about the phys
HI Aman, Thanks for you reply.
HDFS rebalancing is not about moving files, it's about moving blocks of data. 
The underlying block movement does not affect the content and size of the file, 
so buckets are not broken.


http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h@149
PS13, Line 149:   /// Define locks to ensure thread safety when replenishing 
reserved memory.
  :   std::mutex increase_memory_reservation_mtx_;
  :
  :   /// Configure a semaphore to control 
FragmentInstanceState::Exec
  :   /// for each fragment instance that is executed in a bucket.
  :   /// To save memory, only one concurrency is supported in the 
open phase and beyond,
  :   /// after the completion of prepare.
  :   std::unordered_map bucket_fragment_sem_;
  :
  :   /// Configure a counter for each fragment instance to count 
the number of fragment
  :   /// instances that have not yet completed execution, to 
prevent invalid
  :   /// increase_memory_reservation, and to destroy the semaphore 
after the execution of
  :   /// all instances of the fragment in the bucket has completed.
  :   std::unordered_map 
bucket_fragment_un_finished_instances_;
> I couldn't grasp the changes in query life-cycle yet. Can you give some exp
Yes, you are right. In particular, in KrpcDataStreamSender, the hash method is 
used to send each row of data to the corresponding fragment. In this case, hive 
hash is used.
The reason for controlling the fragmentation of data running at the same time 
is to prevent concurrency from running out of resources.
But this is an internal transformation of our company based on impala 3.2. I'm 
still wondering if this piece of logic is necessary.
Can you give me some good advice?


http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h@287
PS13, Line 287: {
> Can you add some tests for this in https://github.com/apache/impala/blob/ma
Yes, I'll add it later.


http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java@1045
PS13, Line 1045: TBucketType.NONE
> This is not from this patch, but I saw that the other value of TBucketType
Yeah, i see you.
For TBucketType, it is compatible with the existence of multiple bucket 
partitioning algorithms. NONE indicates that buckets are not divided. HASH 
indicates that hive hash algorithm is used. Other hash algorithms can be added 
later, such as icebearg, kudu, etc.
The HIVE_HASH or HIVE_BUCKET_V2_HASH name is not used here, because it is 
compatible with hive sql and easier to run hive sql in impala.



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 21 Feb 2023 08:21:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning

2023-02-20 Thread Baike Xia (Code Review)
Baike Xia has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19517


Change subject: IMPALA-11898: Add query options  in the profile if the query 
failed in planning
..

IMPALA-11898: Add query options  in the profile if the query failed in planning

Failed to call RunFrontendPlanner, in profile, add:
Query Options (set by configuration)
Failed after calling RunFrontendPlanner, add:
Query Options (set by configuration and planner)

Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
---
M be/src/service/client-request-state.cc
M be/src/service/impala-server.cc
2 files changed, 5 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/1
--
To view, visit http://gerrit.cloudera.org:8080/19517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64
Gerrit-Change-Number: 19517
Gerrit-PatchSet: 1
Gerrit-Owner: Baike Xia 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-02 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,333 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/13
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-01 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,333 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/12
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-01 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,279 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/11
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-01 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 10:

(12 comments)

Hi Csaba, Thanks for your reply and suggestions.
Look forward to your further comments.

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@9
PS9, Line 9: rovides better
> Besides the bucket operations do we also apply predicates to buckets? For e
That's right, and I think we can add related optimizations in the future.


http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10
PS9, Line 10: performance for some Join queries. Th
> Can you add more info about these optimizations?
Yes, I can.
1. For sort, bucketing not limited to  a partitioned analytic function, the 
aggregate function works as well;
2. It is OK to have multiple bucket columns in one table or multiple bucket 
columns in multiple bucket table;
3. Yes, support bucket shuffle.


http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@11
PS9, Line 11:  the dat
> Can you add some info about the tradeoffs? My understanding is that while b
1. For the first question, it might cause a decrease in parallelism, but, if 
buckets are properly divided, a single bucket is not too large, which does not 
affect query performance;
2. In scheduling, localization execution is still judged first, and bucket 
shuffle does not affect localization execution.


http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13
PS9, Line 13:
> Can you add some info about the effect on scheduling?
The executor is assigned to the node where the bucket is located.


http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@324
PS9, Line 324:* TODO: hbase scans are range-partitioned on the row key
 :*/
> Todo can be removed
Done


http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@510
PS9, Line 510: butionMode() for more details.
> Can you mention buckating join?
Done


http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@521
PS9, Line 521: (ctx_.getQueryOptions().isEnable_bucket_shuffle()
> Is this always the optimal solution?
That's right, for predicate filtering, I want to optimize it in the future.


http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2539
PS9, Line 2539:
> Isn't it Hive hash?
Done


http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/SortNode.java@387
PS9, Line 387: if (isBucketedNode()) {
> Can you translate this to English?
Done


http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/datasets/functional/functional_schema_template.sql@3913
PS9, Line 3913: CLUSTERED BY(id)
> Can you also add a test table that is bucketed by more than 1 column?
OK, that's right.


http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test:

http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test@9
PS9, Line 9: 06:AGGREGATE [FINALIZE]
   : |  output: count:merge(b.id), count:merge(b.string_col)
   : |  row-size=16B cardinality=1
   : |
   : 05:EXCHANGE [UNPARTITIONED]
> I don't understand this part of the plan - shouldn't be there a pre-aggrega
Yes, this is wrong, I fixed it.


http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test@28
PS9, Line 28: | HDFS partitions=12/24 files=12 size=239.77KB
> For bucketed tables it could be useful to add something like buckets=4/4
That's great.



--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
G

[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-02-01 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

Bucket Shuffle Join reduces network overhead and provides better
performance for some Join queries. There is no mandatory requirement
for the data distribution of the table, so it is not easy to cause the
problem of data skew.

Bucket Shuffle Join takes effect only in scenarios where the Join
condition is equal, because it relies on hash to calculate the specified
data distribution.

The equivalent Join condition contains the Bucket columns of two tables.
If the bucket column of the left table is the equivalent Join condition,
it will be planned as Bucket Shuffle Join with a high probability.

In a join/group operation, the bucket column can be one or multiple.
In multi-table join, ensure that the left table is a bucket table.

Currently, only tables based on hdfs storage are supported.
Only the following node types are supported:
ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,278 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/10
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2023-01-30 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18958 )

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@184
PS4, Line 184: \"");
> You could extract this string to a constant (or constexpr) so we wouldn't h
Done


http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@187
PS4, Line 187: erase(
> Something like 'header_len' would be clearer.
Done


http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@188
PS4, Line 188:
> Why do you prefer substr() instead of erase()? Because of the assignment we
This change was controversial and didn't make much sense, so I went back to the 
way it was before.


http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@188
PS4, Line 188:
> We could extract this to a variable, for example 'result_len'.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Comment-Date: Mon, 30 Jan 2023 09:55:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2023-01-30 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18958 )

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..

IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

For /etc/sysconfig/clock, when a row has a '#', we can skip that row,
and optimize the content of parsing lines.
This will fix the parsing problem caused by the '# Zone="utc"'.

Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
---
M be/src/exprs/timezone_db.cc
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/7
--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Xiang Yang 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2023-01-30 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 162 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/14
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,208 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/9
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,211 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/8
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,207 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/7
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
52 files changed, 2,212 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/6
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,202 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/5
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,186 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/4
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,186 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/3
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,187 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/2
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 2
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-18 Thread Baike Xia (Code Review)
Baike Xia has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19430


Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..

IMPALA-3120: Support Bucket Shuffle Join for bucketed table

For query statements that contain bucketed tables and have operations
such as join, group by, sort by, etc., can use bucket shuffle join
to optimize the execution plan, reduce the amount of data transferred,
and reduce query latency.

To ensure consistency with hive, the bucket hash is calculated using
the same method that hive uses to calculate the hash value of a column.

Add new query option as a function switch:
  ENABLE_BUCKET_SHUFFLE
  FRAGMENT_INSTANCE_BUCKET_NUM
  BUCKET_EXEC_BACKEND_RATIO

Testing:
  - Add e2e tests
  - Add fe tests

Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/initial-reservations.cc
M be/src/runtime/initial-reservations.h
M be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/scheduling/schedule-state.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/hash-util.h
M common/protobuf/admission_control_service.proto
M common/protobuf/control_service.proto
M common/protobuf/planner.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Partitions.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/java/org/apache/impala/util/MathUtil.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test
A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test
A tests/query_test/test_bucket_shuffle.py
51 files changed, 2,176 insertions(+), 61 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/1
--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 1
Gerrit-Owner: Baike Xia 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-12-30 Thread Baike Xia (Code Review)
Hello Quanlong Huang, Aman Sinha, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18731

to look at the new patch set (#17).

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,429 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/17
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-12-30 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,429 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/16
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2022-12-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 162 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/13
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-12-06 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,241 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/15
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

2022-12-06 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/18862 )

Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT 
OUTER JOIN
..

IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

Pushdown LIMIT through UNION ALL:
Transforms:
   - Limit
  - Union
 - relation1
 - relation2
 ..
Into:
   - Limit
  - Union
 - Limit
- relation1
 - Limit
- relation2
 ..

Pushdown LIMIT through LEFT/RIGHT OUTER JOIN:
Transforms:
   - Limit
  - Join
 - left source
 - right source
Into:
   - Limit
  - Join
 - Limit (present if Join is left outer)
- left source
 - Limit (present if Join is right outer)
- right source

Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
---
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/joins.test
A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test
A testdata/workloads/functional-query/queries/limit-pushdown-union.test
A tests/query_test/test_limit_pushdown.py
M tests/query_test/test_observability.py
14 files changed, 469 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/8
--
To view, visit http://gerrit.cloudera.org:8080/18862
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
Gerrit-Change-Number: 18862
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2022-12-06 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 154 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/12
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2022-12-06 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 154 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/11
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up

2022-12-06 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18987 )

Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular 
expressions to speed up
..

IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to 
speed up

Each time the RE matches, the query from the cache will speed up the
computation.

Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b
---
M be/src/exec/aggregator.cc
M be/src/exec/aggregator.h
A be/src/exec/exec-node-thread-cache.h
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/like-predicate-ir.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/like-predicate.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/udf/udf-internal.h
M be/src/udf/udf-ir.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
A testdata/workloads/functional-query/queries/QueryTest/thread-cache.test
M tests/query_test/test_queries.py
22 files changed, 219 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/18987/5
--
To view, visit http://gerrit.cloudera.org:8080/18987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b
Gerrit-Change-Number: 18987
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-24 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 20:

> Change has been successfully rebased and submitted as 
> 2733d039ad4a830a1ea34c1a75d2b666788e39a9 by Quanlong Huang

Thank you for the many times of guidance and CR, Quanlong.


--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 20
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 25 Nov 2022 06:24:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-07 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 18:

> Patch Set 18:
>
> Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using 
> Cluster by and so does Hive - 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
> https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html

Hi Manish, glad to see your comment.
In Hive and Spark, "clustered by " is used to specify the bucketed fields and 
number of buckets when the table is created. In select syntax, "cluster by" 
ensures each of N reducers gets non-overlapping ranges , then sorts by those 
ranges at the reducers.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html
https://stackoverflow.com/questions/34495981/difference-between-cluster-by-and-clustered-by-in-hive


--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Manish Maheshwari 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 08 Nov 2022 02:29:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-02 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 18:

(2 comments)

Thanks very much.

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@21
PS17, Line 21: th
> nit: "the"
Done


http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@27
PS17, Line 27: drop
> nit: dropping
Done



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 02 Nov 2022 12:14:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-02 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
 INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is the hash function used
  in Hive's bucketed tables;
2. Create Bucketed Table statements currently don't support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;
4. Support dropping bucketed table;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
18 files changed, 380 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/18
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 18
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-02 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 17:

(12 comments)

Hi Quanlong, thanks for your review and comments.
I have fixed your comments. When testing 'show-create-table', I found a bug, 
and fixed it, and added support for bucketed table deletion.

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@21
PS16, Line 21: he hash functi
> nit: "the hash function used in Hive's bucketed tables"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@22
PS16, Line 22:
> nit: currently don't
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@155
PS16, Line 155: tion
> nit: "type" ?
Maybe that makes it easier to understand: 'Data distribution method of bucketed 
table.'


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@194
PS16, Line 194:   2: optional i64 total_file_bytes
  : }
> nit: The variable names are clear enough. We can simplify the comment to so
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@497
PS16, Line 497:  optional TValidWriteIdList
> nit: "Bucket information for HDFS tables"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@404
PS16, Line 404: isBucketableFormat() {
> nit: it'd be better to rename it to something like "isBucketableFormat"
Great.


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@756
PS16, Line 756: yzeBucketColumns(options_.bucketInfo, getColumnNames(),
> nit: we can skip this check since it's done in the following method.
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@778
PS16, Line 778:   "'%s'", options_.fileFormat));
  : }
  : if (bucketInfo.getNum_bucket() <= 0) {
  :
> nit: kudu is checked in isSupportBucketedTable(). Do we still need this che
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java
File fe/src/main/java/org/apache/impala/util/BucketUtils.java:

http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@20
PS16, Line 20: import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@31
PS16, Line 31: mStorageDescriptor(StorageDe
> nit: "StorageDescriptor of the HMS table"
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test
File testdata/workloads/functional-query/queries/QueryTest/create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test@349
PS16, Line 349:  RESULTS: VERIFY_IS_SUBSET
> Can we add the rows of "Num Buckets" and "Bucket Columns" ?
Done


http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@1013
PS16, Line 1013:  'engine.hive.enabled'='true', 'table_type'='ICEBERG', 
'write.merge.mode'='copy-on-write')
> Could you also add a test for bucket table in this file?
Done



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 02 Nov 2022 11:35:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-11-02 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
 INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is he hash function used
  in Hive's bucketed tables;
2. Create Bucketed Table statements currently don't support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;
4. Support drop bucketed table;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
18 files changed, 380 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/17
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 17
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-28 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
 INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 353 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/16
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 16
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-28 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
 INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 353 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/15
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 15
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-28 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])]
 INTO 24 BUCKETS]

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS;

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
3. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 352 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/14
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-28 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 14:

(1 comment)

> Patch Set 13:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   ;
> Yeah, this is just for table creation. For adding write support, we can sup
Done



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 28 Oct 2022 09:36:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-27 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> We already support the SortBy clause. Currently it's independent with the C
OK, I'm going to do that.
Before, I was thinking about adding the syntax is simple, but the logic we need 
to implement inserts and queries is more complex.



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 28 Oct 2022 03:22:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-27 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29:
> I see. Previously, CLUSTERED is identified as an IDENTIFIER. Now we define
Wow,  I was puzzled for a long time, thanks very much.


http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG@19
PS11, Line 19: :
> Is RANDOM actually useful in practise? Could you share some use cases?
No, isn't. And the random ensures an even distribution of the data,  but do not 
apply bucket_join.
Don't worry about that. As communicated, only one hash algorithm is supported.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   :}
> I see. I checked the Hive parser and realized that in HiveQL the SortBy cla
Yes, i think so. But it was originally intended that later versions would add 
sortby, because this increases the complexity of the implementation. This 
should be achieved in the future.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = 
TableDataLayout.createKuduPartitionedLayout(partition_params); :}
> This hasn't been addressed.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 Oct 2022 09:27:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-27 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [CLUSTERED BY ([column [, column ...]]) INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
CLUSTERED BY (i);

Instructions:
1. The bucket partitioning algorithm is a hash of Hive;
2. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
3. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
4. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 350 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/13
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-27 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [BUCKETED BY ([column [, column ...]]) INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY (i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY (i);

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm is a hash of Hive;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
14 files changed, 349 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/12
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-10-25 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,240 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/14
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 14
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-10-25 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,239 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/13
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 13
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-10-25 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,240 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/12
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-10-24 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18731/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18731/11//COMMIT_MSG@13
PS11, Line 13:  1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
> Is there a specific reason for not supporting cross joins?
cross join was rarely used, and the optimization was not verified.
Spark does not support cross join by default, and Presto/Trino optimizes to 
eliminate cross join.
So We are not going to consider cross join.


http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift@600
PS11, Line 600:   // Whether to enable pushdown under non-equi predicates, 
default is false
> missing line
Got.


http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift@600
PS11, Line 600:   // Whether to enable pushdown under non-equi predicates, 
default is false
  :   149: optional bool enable_none_equal_predicate_push_down = 
false;
> What is the reason for keeping it as default false? My understanding is tha
This change will change the execution plan, and I am worried that it may affect 
the results.
Moreover, the default value is true, which will lead to the rewriting of many 
existing ut test data, which may introduce other problems.
Therefore, I want to default to false first, and then change it to true after a 
period of time


http://gerrit.cloudera.org:8080/#/c/18731/11/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/18731/11/fe/src/main/java/org/apache/impala/analysis/Expr.java@1933
PS11, Line 1933: weight
> What does removing weight means here?
This is a misstatement. What I want to say is, remove duplicates I will fix it.


http://gerrit.cloudera.org:8080/#/c/18731/9/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/18731/9/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@791
PS9, Line 791:   binaryPredicate = new BinaryPredicate(GE, slotBinding, 
minValue);
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test:

http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@5
PS11, Line 5: testtbl
> I think that it would be better to use a non-empty table like functional.al
Yes, that's right. I will add relevant tests. Thanks.


http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@7
PS11, Line 7: 2
> I only see numbers in the tests - does this optimization work with other ty
This optimization is valid for expr in LiteralExpr. I didn't add another type 
of test. I will add relevant tests. Thanks.


http://gerrit.cloudera.org:8080/#/c/18731/9/tests/query_test/test_none_equi_predicate_pushdown.py
File tests/query_test/test_none_equi_predicate_pushdown.py:

http://gerrit.cloudera.org:8080/#/c/18731/9/tests/query_test/test_none_equi_predicate_pushdown.py@23
PS9, Line 23:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/18731/11/tests/query_test/test_none_equi_predicate_pushdown.py
File tests/query_test/test_none_equi_predicate_pushdown.py:

http://gerrit.cloudera.org:8080/#/c/18731/11/tests/query_test/test_none_equi_predicate_pushdown.py@32
PS11, Line 32: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN
> Can you also run to same test case with ENABLE_NONE_EQUAL_PREDICATE_PUSH_DO
OK, I will add it.



--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 25 Oct 2022 04:06:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-24 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@18
PS10, Line 18: CREATE TABLE tbl (i int COMMENT 'hello', s string)
> It'd be better if we can highlight this line since it's the only new part.
Got.


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29
PS10, Line 29:   the hash partition is equivalent to a bucket,
> Do you mean "CLUSTERED" has been used as a hint so we can't use it as a key
If add supported "CLUSTERED" in cup file, execute SQL with CLUTERED in hint and 
an error will be reported.I.E. execute sql - "create  /* +CLUSTERED */ test  as 
select * from tpcds.item;", error messge:
`
Query: create /* +CLUSTERED */ test
as select * from tpcds.item
Query submitted at: 2022-10-24 09:09:52 (Coordinator: http://d403ca04eda0:25000)
ERROR: ParseException: Syntax error in line 1:
create /* +CLUSTERED */ test
   ^
Encountered: CLUSTERED
Expected: STRAIGHT_JOIN, COMMA, IDENTIFIER

CAUSED BY: Exception: Syntax error
`


http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@30
PS10, Line 30:   and the optimization rule applies to join query;
> Are these recognized by Hive? i.e. if Hive inserts data into the table, is
If HASH is used, the behavior is the same as hive. If not, the hive behavior is 
incompatible with the Hive behavior.
If Hive inserts data into the table, it's considered a HASH, which is what we 
expect.

Multiple bucket hash functions are used because hive's bucket hash algorithm is 
different from kudu's bucket hash algorithm. To be compatible with bucket join 
optimization in kudu table, multiple bucket hash functions are used. In other 
words, the kudu table is not supported in HASH mode. Using KUDU_HASH, however, 
results in tabular forms not being recognized by computing engines other than 
impala.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636
PS10, Line 1636:   | opt_bucket_desc:bucket
> I think we don't need two switches here. Like other optional fields, we can
Yes, but adding empty to opt_bucket_desc causes a compilation error. So I took 
this approach. Or, can you give me some advice?
`
Warning : *** Reduce/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and opt_sort_cols ::= (*)
  under symbols: {}
  Resolved in favor of the first production.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_bucket_desc ::= (*)
  and opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list 
RPAREN
  and opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list 
RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.

Warning : *** Shift/Reduce conflict found in state #1587
  between opt_sort_cols ::= (*)
  and opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list 
RPAREN
  and opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN
  and opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list 
RPAREN
  under symbol KW_SORT
  Resolved in favor of shifting.
`


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705
PS10, Line 1705:   {: RESULT = new Pair, TSortingOrder>(null, 
TSortingOrder.LEXICAL); :}
> nit: Let's skip reformatting unrelated codes.
I Got.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex@178
PS10, Line 178: kudu_has
> The commit message mentions "kudu_hash".
Done


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2844
PS10, Line 2844: ber mu
> Can we add tests for "HASH" and "KUDUHASH" without the parentheses?
Yes, I can.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@a4347
PS10, Line 4347:
> Can we keep this since this still doesn't work?
Yes, we can keep this since. This was taken off when I tried clustered.


http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3074
PS10, Line 3074: P

[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-24 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax in the create table statement is as follows:
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS

Example:
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY HASH(i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY KUDU_HASH(i) INTO 24 BUCKETS;
CREATE TABLE tbl (i int COMMENT 'hello', s string)
BUCKETED BY RANDOM INTO 24 BUCKETS;

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 439 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/11
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

2022-10-17 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19143 )

Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET 
TBLPROPERTIES syntax
..


Patch Set 3:

(3 comments)

I fixed it.
Thanks for your review and comments.

http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml
File docs/topics/impala_alter_view.xml:

http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml@73
PS2, Line 73: 'name' = ' Could you change this to the following format?
Done


http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml@76
PS2, Line 76: 'nam
> Please wrap this with  and single quotes.
Done


http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_create_view.xml
File docs/topics/impala_create_view.xml:

http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_create_view.xml@64
PS2, Line 64: 'name' = '<
> Please wrap the two variables with  and single quotes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
Gerrit-Change-Number: 19143
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Shajini Thayasingh 
Gerrit-Comment-Date: Mon, 17 Oct 2022 12:32:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

2022-10-17 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/19143 )

Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET 
TBLPROPERTIES syntax
..

IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

Update document for
[ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ]
and
[ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax.

Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
---
M docs/topics/impala_alter_view.xml
M docs/topics/impala_create_view.xml
2 files changed, 28 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/3
--
To view, visit http://gerrit.cloudera.org:8080/19143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
Gerrit-Change-Number: 19143
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Shajini Thayasingh 


[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

2022-10-14 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19143 )

Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET 
TBLPROPERTIES syntax
..


Patch Set 2:

Hi Quanlong,
In https://gerrit.cloudera.org/c/18940/ without updating the document, this 
commit is updating the document content.
Please help to review it.
Thanks.


--
To view, visit http://gerrit.cloudera.org:8080/19143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
Gerrit-Change-Number: 19143
Gerrit-PatchSet: 2
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 14 Oct 2022 10:45:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

2022-10-14 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/19143 )

Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET 
TBLPROPERTIES syntax
..

IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

Update document for
[ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ]
and
[ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax.

Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
---
M docs/topics/impala_alter_view.xml
M docs/topics/impala_create_view.xml
2 files changed, 28 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/2
--
To view, visit http://gerrit.cloudera.org:8080/19143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
Gerrit-Change-Number: 19143
Gerrit-PatchSet: 2
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

2022-10-14 Thread Baike Xia (Code Review)
Baike Xia has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19143


Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET 
TBLPROPERTIES syntax
..

IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax

Update document for
[ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ]
and
[ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax.

Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
---
M docs/topics/impala_alter_view.xml
M docs/topics/impala_create_view.xml
2 files changed, 29 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/1
--
To view, visit http://gerrit.cloudera.org:8080/19143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec
Gerrit-Change-Number: 19143
Gerrit-PatchSet: 1
Gerrit-Owner: Baike Xia 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-13 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 420 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/10
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] [IMPALA-11625] Support create/drop materialized view syntax on IMPALA

2022-10-13 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19050 )

Change subject: [IMPALA-11625] Support create/drop materialized view syntax on 
IMPALA
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@124
PS6, Line 124: MATERIALIZED_VIEW
This one doesn't seem to be in use. Can we delete it?


http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@645
PS6, Line 645: if 
(view.getMetaStoreTable().getTableType().equals(MATERIALIZED_VIEW.toString()))
That doesn't seem right. enum cannot be compared with string to obtain true.
Should write it this way?
if 
(view.getMetaStoreTable().getTableType().name().equals(MATERIALIZED_VIEW.toString()))



--
To view, visit http://gerrit.cloudera.org:8080/19050
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77fdd34bf04a8994a215170747249356cd40622b
Gerrit-Change-Number: 19050
Gerrit-PatchSet: 7
Gerrit-Owner: pengdou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 13 Oct 2022 12:22:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

2022-10-10 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18862 )

Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT 
OUTER JOIN
..

IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

Pushdown LIMIT through UNION ALL:
Transforms:
   - Limit
  - Union
 - relation1
 - relation2
 ..
Into:
   - Limit
  - Union
 - Limit
- relation1
 - Limit
- relation2
 ..

Pushdown LIMIT through LEFT/RIGHT OUTER JOIN:
Transforms:
   - Limit
  - Join
 - left source
 - right source
Into:
   - Limit
  - Join
 - Limit (present if Join is left outer)
- left source
 - Limit (present if Join is right outer)
- right source

Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
---
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/joins.test
A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test
A testdata/workloads/functional-query/queries/limit-pushdown-union.test
A tests/query_test/test_limit_pushdown.py
M tests/query_test/test_observability.py
14 files changed, 467 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/7
--
To view, visit http://gerrit.cloudera.org:8080/18862
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
Gerrit-Change-Number: 18862
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-10 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 415 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/9
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table

2022-10-10 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu/iceberg table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg 
table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch
try to add such semantics for kudu/iceberg table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
6 files changed, 154 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/10
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

2022-10-10 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/18862 )

Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT 
OUTER JOIN
..

IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN

Pushdown LIMIT through UNION ALL:
Transforms:
   - Limit
  - Union
 - relation1
 - relation2
 ..
Into:
   - Limit
  - Union
 - Limit
- relation1
 - Limit
- relation2
 ..

Pushdown LIMIT through LEFT/RIGHT OUTER JOIN:
Transforms:
   - Limit
  - Join
 - left source
 - right source
Into:
   - Limit
  - Join
 - Limit (present if Join is left outer)
- left source
 - Limit (present if Join is right outer)
- right source

Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
---
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/joins.test
A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test
A testdata/workloads/functional-query/queries/limit-pushdown-union.test
A tests/query_test/test_limit_pushdown.py
M tests/query_test/test_observability.py
13 files changed, 452 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/6
--
To view, visit http://gerrit.cloudera.org:8080/18862
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
Gerrit-Change-Number: 18862
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up

2022-10-09 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18987 )

Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular 
expressions to speed up
..


Patch Set 4:

(3 comments)

> Patch Set 3:
>
> (4 comments)

http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h
File be/src/exec/exec-node-thread-cache.h:

http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@37
PS3, Line 37:   boost::unordered_map regex_cache_key_map_;
> map internally is a BST while unordered_map is a hash table, since we care
OK.


http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@40
PS3, Line 40:   // 0 indicates the initial status;
:   // 1 indicates successful matching;
:   // 2 indicates failure matching;
:   // 3 indicates null;
> how about using an enum to represent these states?
These states are only used internally and are not exposed to the outside world, 
so there is no need to use enumerations.


http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@76
PS3, Line 76: int cache_status = regex_cache_[cache_id];
> how about directly using the string `cache_key` to get the cache value? you
In the current implementation, result state can be saved(init, successful, 
failure, null), and as of now, callers only have LikePredicate. The complexity 
of using unordered_map is O(1). So I thought, do we need to change it?



--
To view, visit http://gerrit.cloudera.org:8080/18987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b
Gerrit-Change-Number: 18987
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Comment-Date: Mon, 10 Oct 2022 06:35:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up

2022-10-09 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18987 )

Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular 
expressions to speed up
..

IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to 
speed up

Each time the RE matches, the query from the cache will speed up the
computation.

Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b
---
M be/src/exec/aggregator.cc
M be/src/exec/aggregator.h
A be/src/exec/exec-node-thread-cache.h
M be/src/exec/grouping-aggregator-ir.cc
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/like-predicate-ir.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/like-predicate.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/udf/udf-internal.h
M be/src/udf/udf-ir.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
A testdata/workloads/functional-query/queries/QueryTest/thread-cache.test
M tests/query_test/test_queries.py
22 files changed, 219 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/18987/4
--
To view, visit http://gerrit.cloudera.org:8080/18987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b
Gerrit-Change-Number: 18987
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN

2022-10-09 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18862 )

Change subject: IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER 
JOIN
..


Patch Set 5:

> Patch Set 4:
>
> (2 comments)

Yes, you are right. I changed it.


--
To view, visit http://gerrit.cloudera.org:8080/18862
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
Gerrit-Change-Number: 18862
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 09 Oct 2022 11:05:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN

2022-10-09 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18862 )

Change subject: IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER 
JOIN
..

IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN

Pushdown LIMIT through UNION:
Transforms:
   - Limit
  - Union
 - relation1
 - relation2
 ..
Into:
   - Limit
  - Union
 - Limit
- relation1
 - Limit
- relation2
 ..

Pushdown LIMIT through LEFT/RIGHT OUTER JOIN:
Transforms:
   - Limit
  - Join
 - left source
 - right source
Into:
   - Limit
  - Join
 - Limit (present if Join is left outer)
- left source
 - Limit (present if Join is right outer)
- right source

Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
---
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/joins.test
A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test
A testdata/workloads/functional-query/queries/limit-pushdown-union.test
A tests/query_test/test_limit_pushdown.py
M tests/query_test/test_observability.py
13 files changed, 340 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/5
--
To view, visit http://gerrit.cloudera.org:8080/18862
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c
Gerrit-Change-Number: 18862
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2022-10-09 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18958 )

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18958/1/be/src/exprs/timezone_db.cc
File be/src/exprs/timezone_db.cc:

http://gerrit.cloudera.org:8080/#/c/18958/1/be/src/exprs/timezone_db.cc@183
PS1, Line 183:   if (result.rfind("#", 0) == 0) continue;
> Hi all, what if line start with '\t#'? or even line is 'ZONE=UTC # some com
I thnik this should be defined as the wrong way of writing, and this should be 
a small probability event.



--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Xiang Yang 
Gerrit-Comment-Date: Sun, 09 Oct 2022 09:33:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu table
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@508
PS8, Line 508: hasColumnsAd
> nit: "hasColumnsAdded" might be better
Done.


http://gerrit.cloudera.org:8080/#/c/18953/8/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

http://gerrit.cloudera.org:8080/#/c/18953/8/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@374
PS8, Line 374: ID, NAME, VALI, NEW_COL1, NEW_COL2, NEW_COL3, NEW_COL4, NEW_COL5
> Could you add some tests on DESCRIBE between these ALTER statements?
Done.



--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 09 Oct 2022 03:38:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu table. This patch
try to add such semantics for kudu table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
3 files changed, 92 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/9
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 9
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2022-10-08 Thread Baike Xia (Code Review)
Hello Jian Zhang, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18958

to look at the new patch set (#4).

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..

IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

For /etc/sysconfig/clock, when a row has a '#', we can skip that row,
and optimize the content of parsing lines.
This will fix the parsing problem caused by the '# Zone="utc"'.

Note:
The erase() function modifies the original string instead of creating
a new string.
The substr() function returns a new string with
the specified characters instead of modifying the original string.

Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
---
M be/src/exprs/timezone_db.cc
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/4
--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18958 )

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18958/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18958/2//COMMIT_MSG@7
PS2, Line 7: Optimized
> Won't the behavior change if we bump to a commented out zone line? e.g.
Oh, Yes, this will fix the situation, this resolves to the content after '#' 
before fixing. Thanks.



--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Comment-Date: Sun, 09 Oct 2022 02:58:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu table
..


Patch Set 8:

(1 comment)

> Patch Set 8: Verified+1

http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java:

http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java@87
PS8, Line 87: throw new AnalysisException("Duplicate column name: " + 
colName);
> should we also ignore this error when `if not exists` is presented?
It's already there, and It works for this situation 'alter table tbl add if not 
exists columns (b int, b int);', but this might be debatable if the field types 
were different. e.g.: alter table tbl add if not exists columns (b int, b 
string);



--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 09 Oct 2022 02:27:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/18958 )

Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time 
zone
..

IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone

For /etc/sysconfig/clock, when a row has a '#', we can skip that row,
and optimize the content of parsing lines.

Note:
The erase() function modifies the original string instead of creating
a new string.
The substr() function returns a new string with
the specified characters instead of modifying the original string.

Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
---
M be/src/exprs/timezone_db.cc
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/3
--
To view, visit http://gerrit.cloudera.org:8080/18958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458
Gerrit-Change-Number: 18958
Gerrit-PatchSet: 3
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-10-08 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 413 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/8
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

2022-09-30 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns 
for kudu table
..

IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table

Impala already supports IF NOT EXISTS in alter table add columns for
general hive table in IMPALA-7832, but not for kudu table. This patch
try to add such semantics for kudu table.

Testing:
- Updated E2E DDL tests

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
3 files changed, 64 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/8
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 8
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-09-30 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 411 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/7
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-09-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 411 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/6
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-09-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19055 )

Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
15 files changed, 408 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/5
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables

2022-09-29 Thread Baike Xia (Code Review)
Baike Xia has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19055


Change subject: IMPALA-3119: DDL support for bucketed tables
..

IMPALA-3119: DDL support for bucketed tables

Add syntactic support for creating bucketed table.
The specific syntax is as follows:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name(
   col_name data_type
   [constraint_specification]
   [COMMENT 'col_comment']
   [, ...]
 )
 [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
 [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS
 [SORT BY ([column [, column ...]])]
 [COMMENT 'table_comment']
 [ROW FORMAT row_format]
 [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
 [STORED AS file_format]
 [LOCATION 'hdfs_path']
 [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]
 [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]

Instructions:
1. CLUSTERED BY of Hive is not supported, because HINT has the keyword;
2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH.
  The default value is HASH;
3. INTO 24 BUCKETS, specifies the number of buckets, the default value
  is 16;
4. Create Bucketed Table statements that do not support Kudu and
  Iceberg tables, but for a Kudu table,
  the hash partition is equivalent to a bucket,
  and the optimization rule applies to join query;
5. In the current version, alter operations(add/drop/change/replace
 columns) on bucketed tables are not supported;

This COMMIT is the first subtask of IMPALA-3118.

Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
---
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/BucketUtils.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
16 files changed, 411 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/4
--
To view, visit http://gerrit.cloudera.org:8080/19055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e
Gerrit-Change-Number: 19055
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 


[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/18940 )

Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
..

IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

Add TBLPROPERTIES support to the view, here are some examples:
  CREATE VIEW [IF NOT EXISTS] [database_name.]view_name
[(column_name [COMMENT 'column_comment'][, ...])]
[COMMENT 'view_comment']
[TBLPROPERTIES (property_name = property_value, ...)]
AS select_statement;

  ALTER VIEW [database_name.]view_name SET TBLPROPERTIES
(property_name = property_value, ...);

  ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES
(property_name, ...);

Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/views-ddl.test
14 files changed, 441 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/12
--
To view, visit http://gerrit.cloudera.org:8080/18940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
Gerrit-Change-Number: 18940
Gerrit-PatchSet: 12
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/18940 )

Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
..

IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

Add TBLPROPERTIES support to the view, here are some examples:
  CREATE VIEW [IF NOT EXISTS] [database_name.]view_name
[(column_name [COMMENT 'column_comment'][, ...])]
[COMMENT 'view_comment']
[TBLPROPERTIES (property_name = property_value, ...)]
AS select_statement;

  ALTER VIEW [database_name.]view_name SET TBLPROPERTIES
(property_name = property_value, ...);

  ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES
(property_name, ...);

Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
13 files changed, 372 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/11
--
To view, visit http://gerrit.cloudera.org:8080/18940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
Gerrit-Change-Number: 18940
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/18940 )

Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
..

IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES

Add TBLPROPERTIES support to the view, here are some examples:
  CREATE VIEW [IF NOT EXISTS] [database_name.]view_name
[(column_name [COMMENT 'column_comment'][, ...])]
[COMMENT 'view_comment']
[TBLPROPERTIES (property_name = property_value, ...)]
AS select_statement;

  ALTER VIEW [database_name.]view_name SET TBLPROPERTIES
(property_name = property_value, ...);

  ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES
(property_name, ...);

Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
13 files changed, 371 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/10
--
To view, visit http://gerrit.cloudera.org:8080/18940
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18
Gerrit-Change-Number: 18940
Gerrit-PatchSet: 10
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: For alter table, add column operation is optimized
..

IMPALA-11565: For alter table, add column operation is optimized

For alter table, add if not exists column operation,
if the columns already exist and are of the same type,
no operation is performed;
If the type is different, an error is reported.

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
4 files changed, 40 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/7
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 7
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/18731 )

Change subject: IMPALA-11424: Support pushdown non-equi join predicate from 
OUTER/INNER JOIN to SCANNODE
..

IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to 
SCANNODE

In order to reduce the amount of data read and transmitted,
the non-equivalent condition of Join can be pushed to SCAN_NODE.

For pushdown of Join non-equi conjuncts, the current qualifications:
 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
 2. For non-equi predicates containing literalExpr,
  for example: slot >= Literal, slot in Literal list;
 3. Push down the predicate for a complex filter condition
that contains only one column.
For example, cast(A as int) > 10 to push down to SCAN.
 4. Currently only the associated predicate operation type is:
EQ,LE,LT,GE,GT;
 5. Currently only the associated predicate:
BinaryPredicate and InPredicate;

Pushdown logic:
 1. Get the mapping relationship between slot
and non-equi conjunct list, and get the mapping relationship
between slot and equi conjunct list;
 2. For the case where there are equal and non-equi conjuncts
in the slot at the same time, calculate the maximum
and minimum values of the equi conjuncts;
 3. The maximum and minimum values are newly built into binaryPredicate
according to non-equi conjunct;
 4. Push all binaryPredicates down to a specific scan node;

And add new query option as a function switch:
 ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN

Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test
A 
testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test
A tests/query_test/test_none_equi_predicate_pushdown.py
10 files changed, 1,132 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/11
--
To view, visit http://gerrit.cloudera.org:8080/18731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263
Gerrit-Change-Number: 18731
Gerrit-PatchSet: 11
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: For alter table, add column operation is optimized
..

IMPALA-11565: For alter table, add column operation is optimized

For alter table, add if not exists column operation,
if the columns already exist and are of the same type,
no operation is performed;
If the type is different, an error is reported.

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
4 files changed, 40 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/6
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: For alter table, add column operation is optimized
..


Patch Set 5:

(2 comments)

I made new a fix.

http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java:

http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java@87
PS3, Line 87:   if (col != null) {
: if (!ifNotExists) {
:   throw new AnalysisException("Column already exists: " + 
colName);
: }
: 
: // handle the case that ifNotExists is true
: if (!col.getType().equals(c.getType())) {
:   throw new AnalysisException(String.format("Error adding 
column %s " +
:   "from table %s: type not match", colName, 
t.getName()));
: }
:
:   
> how about simplifying the precondition check to the following to improve th
Yeah, this is better. Thx.


http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1029
PS3, Line 1029:   // In AlterTableAddColsStmt, there is remove for 
columns.
  :   // May cause columns to be empty
> could you add a coment about the reason for adding this validation, is it a
I added a comment. 
In AlterTableAddColsStmt, there is remove for columns. May cause columns to be 
empty.



--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Comment-Date: Mon, 26 Sep 2022 07:54:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: For alter table, add column operation is optimized
..

IMPALA-11565: For alter table, add column operation is optimized

For alter table, add if not exists column operation,
if the columns already exist and are of the same type,
no operation is performed;
If the type is different, an error is reported.

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
4 files changed, 40 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/5
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized

2022-09-26 Thread Baike Xia (Code Review)
Baike Xia has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/18953 )

Change subject: IMPALA-11565: For alter table, add column operation is optimized
..

IMPALA-11565: For alter table, add column operation is optimized

For alter table, add if not exists column operation,
if the columns already exist and are of the same type,
no operation is performed;
If the type is different, an error is reported.

Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
---
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
4 files changed, 40 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/4
--
To view, visit http://gerrit.cloudera.org:8080/18953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517
Gerrit-Change-Number: 18953
Gerrit-PatchSet: 4
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 


  1   2   >