[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile even if the query failed in planning .. IMPALA-11898: Add query options in the profile even if the query failed in planning Currently, query options are added to profile in ClientRequestState::Exec() which is not executed if the query failed in planning or not admitted (e.g. timeout in queueing or cancelled before execution). This patch moves the logics to where the query options are ready to be added. To be specifit, "Query Options (set by configuration)" is there when the client submits the request, so we add it in the constructor of ClientRequestState. "Query Options (set by configuration and planner)" is ready when planning finishes. So it's moved to right after the call of RunFrontendPlanner(). Testing: - Run the query with AnalysisException. - Added test to make sure "Impala Query State" is populated. Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/runtime/query-driver.cc M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M tests/query_test/test_observability.py 4 files changed, 23 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/8 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile even if the query failed in planning .. Patch Set 6: (4 comments) Hi Quanlong and Daniel, I'm back. I have solved some problems, thanks very much for your review. http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG@7 PS3, Line 7: even > "even" ? Done http://gerrit.cloudera.org:8080/#/c/19517/3//COMMIT_MSG@13 PS3, Line 13: This patch moves the logics to where the query options are ready > Please mention the cause of the bug and how the query options are added. E. Done http://gerrit.cloudera.org:8080/#/c/19517/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/4//COMMIT_MSG@7 PS4, Line 7: even > Nit: even, not event. Done http://gerrit.cloudera.org:8080/#/c/19517/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/19517/3/be/src/service/impala-server.cc@1247 PS3, Line 1247: > Plase add a comment, e.g. "Add profile info items that are ready after RunF Done -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 04 Aug 2023 06:48:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile even if the query failed in planning
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile even if the query failed in planning .. IMPALA-11898: Add query options in the profile even if the query failed in planning Currently, query options are added to profile in ClientRequestState::Exec() which is not executed if the query failed in planning or not admitted (e.g. timeout in queueing or cancelled before execution). This patch moves the logics to where the query options are ready to be added. To be specifit, "Query Options (set by configuration)" is there when the client submits the request, so we add it in the constructor of ClientRequestState. "Query Options (set by configuration and planner)" is ready when planning finishes. So it's moved to right after the call of RunFrontendPlanner(). Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/runtime/query-driver.cc M be/src/service/client-request-state.cc M be/src/service/impala-server.cc M tests/query_test/test_observability.py 4 files changed, 23 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/6 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 13 files changed, 1,448 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/18 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 18 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile event if the query failed in planning
Baike Xia has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile event if the query failed in planning .. IMPALA-11898: Add query options in the profile event if the query failed in planning Query options are normally included in the profile, but when the query fails during planning, query options are missing. After this change, query options are also added to the profile upon planning failure. Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 2 files changed, 16 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/4 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10861: Optimize the plan for identical predicates
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19511 ) Change subject: IMPALA-10861: Optimize the plan for identical predicates .. Patch Set 3: (7 comments) > Patch Set 2: > > (7 comments) > > Thanks Baike for the fix. Hi Yida, Thanks for your advice and reply. I made some fixes in response to your suggestions. Looking forward to your reply and CR. http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9 PS2, Line 9: For the query with two same predicates, duplicated data is deleted > Would be good to elaborate the comment on what the current issue is and how OK, I'm going to optimize this description. http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9 PS2, Line 9: the > nit. the? Done http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@9 PS2, Line 9: duplicate > nit. duplicated Done http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@12 PS2, Line 12: ing. > Would it also work for below cases? Done http://gerrit.cloudera.org:8080/#/c/19511/2//COMMIT_MSG@12 PS2, Line 12: > should be a.id = b.id? Done http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test File testdata/workloads/functional-planner/queries/PlannerTest/joins.test: http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3118 PS2, Line 3118: the > nit. the? Done http://gerrit.cloudera.org:8080/#/c/19511/2/testdata/workloads/functional-planner/queries/PlannerTest/joins.test@3118 PS2, Line 3118: duplicate > nit. duplicated Done -- To view, visit http://gerrit.cloudera.org:8080/19511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia249c8146215fad602e9310bf922c6bfa050b96b Gerrit-Change-Number: 19511 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Mon, 20 Mar 2023 17:30:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests - Added fe tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 194 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/16 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 16 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. Patch Set 15: Hi Quanlong and Penglin, I corrected the suggestion and resolved the conflict. Could you please CR again? -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 15 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 28 Feb 2023 12:00:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 162 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/15 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 15 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile event if the query failed in planning
Baike Xia has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile event if the query failed in planning .. IMPALA-11898: Add query options in the profile event if the query failed in planning Query options are normally included in the profile, but when the query fails during planning, query options are missing. After this change, query options are also added to the profile upon planning failure. Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 2 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/3 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 14: Hi Csaba, I added some tests as you suggested earlier. I added be/src/util/hash-util-test.cc to test if hive and impala hash methods have the same results.A test case for the non-partitioned table order has also been added. Thanks again for your reply and suggestions. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 28 Feb 2023 10:03:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 14: > Patch Set 13: > > > Patch Set 13: > > > > (1 comment) > > > > > Patch Set 13: > > > > > > (1 comment) > > The problem is that it would not be practical to check the block locations > for for potential relocations when doing the query planning. Given N blocks > in a bucket for one table and M blocks for the second table, it would be > O(N+M) time to decide which distribution method to use. This would add up > depending on the number of joins in the query. We really want to 'pin' the > location but AFAIK HDFS does not allow us to do that. Other systems such as > MemSQL that do bucket join don't have to worry about this since the data is > memory resident. Given N blocks in a bucket for one table and M blocks for the second table, we can use the least common divisor of N and M as the number of buckets for the two tables temporarily. I'm not sure I understand what you mean. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 28 Feb 2023 09:59:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. Patch Set 2: (3 comments) Hi Daniel, Thank you for your reply and suggestions. http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@7 PS1, Line 7: i > Nit: unnecessary double space. Done http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@9 PS1, Line 9: Query options should usually be in the profile, > Could you clarify the commit message? Yeah, just like the jira title, query options should usually be in the profile, but when the Query fails during planning, Query options are missing. The RunFrontendPlanner method is used for planning. Failure here should add query options. e.g.: Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available. If the query fails after planning we should list the query options set by configuration and the planner. e.g.: Query Options (set by configuration): TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available Query Options (set by configuration and planner): MT_DOP=0,TIMEZONE=PRC,CLIENT_IDENTIFIER=impala shell build version not available,MINMAX_FILTER_THRESHOLD=0.5,MINMAX_FILTERING_LEVEL=PAGE http://gerrit.cloudera.org:8080/#/c/19517/1//COMMIT_MSG@13 PS1, Line 13: Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 > Can we add tests that verify that the query options are included in the pro Sorry, I can't. I tried to add the relevant tests, but couldn't find where to add them. Can you tell me? -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 08:44:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Baike Xia has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19517 ) Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. IMPALA-11898: Add query options in the profile if the query failed in planning Query options should usually be in the profile, but when the Query fails during planning, Query options are missing. Upon failure, should add query options to the profile. Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 2 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/2 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/CMakeLists.txt A be/src/util/hash-util-test.cc M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 53 files changed, 2,466 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/14 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 13: (1 comment) > Patch Set 13: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13 PS9, Line 13: > I think my concern is similar to Aman's. Yes, you are right to be concerned. As you said, the remote read cost caused by bucket shuffle join is higher than the cost of shuffle. I think this is an optimization, a CBO rule. The current version does not allow for similar optimizations and degradations, which I think will happen in the future. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 22 Feb 2023 03:51:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 13: (6 comments) Hi Csaba And Aman, I was busy with some things at work some time ago, so I didn't have much time to deal with the reply, and I'm so sorry. Now I'm back. Look forward to your reply and suggestions. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10 PS9, Line 10: performance for some Join queries. Th > I still don't get the non-partitoned sort case. Can you give an example que Yes, I'll add it later. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13 PS9, Line 13: > Is there a node where the whole bucket is located? I mean that if there are I don't think I understand what you mean, Can you explain that again? http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/13//COMMIT_MSG@25 PS13, Line 25: based on hdfs storage are supported. > Thanks for the detailed patch. I have a high level question about the phys HI Aman, Thanks for you reply. HDFS rebalancing is not about moving files, it's about moving blocks of data. The underlying block movement does not affect the content and size of the file, so buckets are not broken. http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/runtime/query-state.h@149 PS13, Line 149: /// Define locks to ensure thread safety when replenishing reserved memory. : std::mutex increase_memory_reservation_mtx_; : : /// Configure a semaphore to control FragmentInstanceState::Exec : /// for each fragment instance that is executed in a bucket. : /// To save memory, only one concurrency is supported in the open phase and beyond, : /// after the completion of prepare. : std::unordered_map bucket_fragment_sem_; : : /// Configure a counter for each fragment instance to count the number of fragment : /// instances that have not yet completed execution, to prevent invalid : /// increase_memory_reservation, and to destroy the semaphore after the execution of : /// all instances of the fragment in the bucket has completed. : std::unordered_map bucket_fragment_un_finished_instances_; > I couldn't grasp the changes in query life-cycle yet. Can you give some exp Yes, you are right. In particular, in KrpcDataStreamSender, the hash method is used to send each row of data to the corresponding fragment. In this case, hive hash is used. The reason for controlling the fragmentation of data running at the same time is to prevent concurrency from running out of resources. But this is an internal transformation of our company based on impala 3.2. I'm still wondering if this piece of logic is necessary. Can you give me some good advice? http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h File be/src/util/hash-util.h: http://gerrit.cloudera.org:8080/#/c/19430/13/be/src/util/hash-util.h@287 PS13, Line 287: { > Can you add some tests for this in https://github.com/apache/impala/blob/ma Yes, I'll add it later. http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/19430/13/fe/src/main/java/org/apache/impala/catalog/Table.java@1045 PS13, Line 1045: TBucketType.NONE > This is not from this patch, but I saw that the other value of TBucketType Yeah, i see you. For TBucketType, it is compatible with the existence of multiple bucket partitioning algorithms. NONE indicates that buckets are not divided. HASH indicates that hive hash algorithm is used. Other hash algorithms can be added later, such as icebearg, kudu, etc. The HIVE_HASH or HIVE_BUCKET_V2_HASH name is not used here, because it is compatible with hive sql and easier to run hive sql in impala. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 21 Feb 2023 08:21:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11898: Add query options in the profile if the query failed in planning
Baike Xia has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19517 Change subject: IMPALA-11898: Add query options in the profile if the query failed in planning .. IMPALA-11898: Add query options in the profile if the query failed in planning Failed to call RunFrontendPlanner, in profile, add: Query Options (set by configuration) Failed after calling RunFrontendPlanner, add: Query Options (set by configuration and planner) Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 --- M be/src/service/client-request-state.cc M be/src/service/impala-server.cc 2 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/19517/1 -- To view, visit http://gerrit.cloudera.org:8080/19517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0e9ce62008dd5b1671b09eda5365cbb0940ebe64 Gerrit-Change-Number: 19517 Gerrit-PatchSet: 1 Gerrit-Owner: Baike Xia
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,333 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/13 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,333 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/12 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 12 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,279 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/11 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. Patch Set 10: (12 comments) Hi Csaba, Thanks for your reply and suggestions. Look forward to your further comments. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@9 PS9, Line 9: rovides better > Besides the bucket operations do we also apply predicates to buckets? For e That's right, and I think we can add related optimizations in the future. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@10 PS9, Line 10: performance for some Join queries. Th > Can you add more info about these optimizations? Yes, I can. 1. For sort, bucketing not limited to a partitioned analytic function, the aggregate function works as well; 2. It is OK to have multiple bucket columns in one table or multiple bucket columns in multiple bucket table; 3. Yes, support bucket shuffle. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@11 PS9, Line 11: the dat > Can you add some info about the tradeoffs? My understanding is that while b 1. For the first question, it might cause a decrease in parallelism, but, if buckets are properly divided, a single bucket is not too large, which does not affect query performance; 2. In scheduling, localization execution is still judged first, and bucket shuffle does not affect localization execution. http://gerrit.cloudera.org:8080/#/c/19430/9//COMMIT_MSG@13 PS9, Line 13: > Can you add some info about the effect on scheduling? The executor is assigned to the node where the bucket is located. http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@324 PS9, Line 324:* TODO: hbase scans are range-partitioned on the row key :*/ > Todo can be removed Done http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@510 PS9, Line 510: butionMode() for more details. > Can you mention buckating join? Done http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@521 PS9, Line 521: (ctx_.getQueryOptions().isEnable_bucket_shuffle() > Is this always the optimal solution? That's right, for predicate filtering, I want to optimize it in the future. http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2539 PS9, Line 2539: > Isn't it Hive hash? Done http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/19430/9/fe/src/main/java/org/apache/impala/planner/SortNode.java@387 PS9, Line 387: if (isBucketedNode()) { > Can you translate this to English? Done http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/datasets/functional/functional_schema_template.sql@3913 PS9, Line 3913: CLUSTERED BY(id) > Can you also add a test table that is bucketed by more than 1 column? OK, that's right. http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test File testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test: http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test@9 PS9, Line 9: 06:AGGREGATE [FINALIZE] : | output: count:merge(b.id), count:merge(b.string_col) : | row-size=16B cardinality=1 : | : 05:EXCHANGE [UNPARTITIONED] > I don't understand this part of the plan - shouldn't be there a pre-aggrega Yes, this is wrong, I fixed it. http://gerrit.cloudera.org:8080/#/c/19430/9/testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test@28 PS9, Line 28: | HDFS partitions=12/24 files=12 size=239.77KB > For bucketed tables it could be useful to add something like buckets=4/4 That's great. -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 G
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table Bucket Shuffle Join reduces network overhead and provides better performance for some Join queries. There is no mandatory requirement for the data distribution of the table, so it is not easy to cause the problem of data skew. Bucket Shuffle Join takes effect only in scenarios where the Join condition is equal, because it relies on hash to calculate the specified data distribution. The equivalent Join condition contains the Bucket columns of two tables. If the bucket column of the left table is the equivalent Join condition, it will be planned as Bucket Shuffle Join with a high probability. In a join/group operation, the bucket column can be one or multiple. In multi-table join, ensure that the left table is a bucket table. Currently, only tables based on hdfs storage are supported. Only the following node types are supported: ScanNode/UnionNode/HashJoinNode/AggregationNode/AnalyticEvalNode/SortNode. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,278 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/10 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 10 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18958 ) Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@184 PS4, Line 184: \""); > You could extract this string to a constant (or constexpr) so we wouldn't h Done http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@187 PS4, Line 187: erase( > Something like 'header_len' would be clearer. Done http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@188 PS4, Line 188: > Why do you prefer substr() instead of erase()? Because of the assignment we This change was controversial and didn't make much sense, so I went back to the way it was before. http://gerrit.cloudera.org:8080/#/c/18958/4/be/src/exprs/timezone_db.cc@188 PS4, Line 188: > We could extract this to a variable, for example 'result_len'. Done -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Xiang Yang Gerrit-Comment-Date: Mon, 30 Jan 2023 09:55:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18958 ) Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone For /etc/sysconfig/clock, when a row has a '#', we can skip that row, and optimize the content of parsing lines. This will fix the parsing problem caused by the '# Zone="utc"'. Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 --- M be/src/exprs/timezone_db.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/7 -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Xiang Yang
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 162 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/14 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,208 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/9 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 9 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,211 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/8 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,207 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/7 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 52 files changed, 2,212 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/6 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,202 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/5 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,186 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/4 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,186 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/3 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19430 ) Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,187 insertions(+), 70 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/2 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table
Baike Xia has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19430 Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table .. IMPALA-3120: Support Bucket Shuffle Join for bucketed table For query statements that contain bucketed tables and have operations such as join, group by, sort by, etc., can use bucket shuffle join to optimize the execution plan, reduce the amount of data transferred, and reduce query latency. To ensure consistency with hive, the bucket hash is calculated using the same method that hive uses to calculate the hash value of a column. Add new query option as a function switch: ENABLE_BUCKET_SHUFFLE FRAGMENT_INSTANCE_BUCKET_NUM BUCKET_EXEC_BACKEND_RATIO Testing: - Add e2e tests - Add fe tests Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/data-stream-test.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/initial-reservations.h M be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/scheduling/schedule-state.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/hash-util.h M common/protobuf/admission_control_service.proto M common/protobuf/control_service.proto M common/protobuf/planner.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Partitions.thrift M common/thrift/PlanNodes.thrift M common/thrift/Planner.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-planner/queries/PlannerTest/bucket-shuffle.test A testdata/workloads/functional-query/queries/QueryTest/bucket-shuffle.test A tests/query_test/test_bucket_shuffle.py 51 files changed, 2,176 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/19430/1 -- To view, visit http://gerrit.cloudera.org:8080/19430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316 Gerrit-Change-Number: 19430 Gerrit-PatchSet: 1 Gerrit-Owner: Baike Xia
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Hello Quanlong Huang, Aman Sinha, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18731 to look at the new patch set (#17). Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,429 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/17 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 17 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,429 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/16 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 16 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 162 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/13 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,241 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/15 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 15 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18862 ) Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN .. IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN Pushdown LIMIT through UNION ALL: Transforms: - Limit - Union - relation1 - relation2 .. Into: - Limit - Union - Limit - relation1 - Limit - relation2 .. Pushdown LIMIT through LEFT/RIGHT OUTER JOIN: Transforms: - Limit - Join - left source - right source Into: - Limit - Join - Limit (present if Join is left outer) - left source - Limit (present if Join is right outer) - right source Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/joins.test A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test A testdata/workloads/functional-query/queries/limit-pushdown-union.test A tests/query_test/test_limit_pushdown.py M tests/query_test/test_observability.py 14 files changed, 469 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/8 -- To view, visit http://gerrit.cloudera.org:8080/18862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c Gerrit-Change-Number: 18862 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 154 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/12 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 12 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 154 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/11 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18987 ) Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up .. IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up Each time the RE matches, the query from the cache will speed up the computation. Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b --- M be/src/exec/aggregator.cc M be/src/exec/aggregator.h A be/src/exec/exec-node-thread-cache.h M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu/kudu-scanner.cc M be/src/exec/kudu/kudu-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/like-predicate-ir.cc M be/src/exprs/like-predicate.cc M be/src/exprs/like-predicate.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/udf/udf-internal.h M be/src/udf/udf-ir.cc M be/src/udf/udf.cc M be/src/udf/udf.h A testdata/workloads/functional-query/queries/QueryTest/thread-cache.test M tests/query_test/test_queries.py 22 files changed, 219 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/18987/5 -- To view, visit http://gerrit.cloudera.org:8080/18987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b Gerrit-Change-Number: 18987 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 20: > Change has been successfully rebased and submitted as > 2733d039ad4a830a1ea34c1a75d2b666788e39a9 by Quanlong Huang Thank you for the many times of guidance and CR, Quanlong. -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 20 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 25 Nov 2022 06:24:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 18: > Patch Set 18: > > Can we use "CLUSTER BY" rather than "CLUSTERED BY"? I see Spark also using > Cluster by and so does Hive - > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy > https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-clusterby.html Hi Manish, glad to see your comment. In Hive and Spark, "clustered by " is used to specify the bucketed fields and number of buckets when the table is created. In select syntax, "cluster by" ensures each of N reducers gets non-overlapping ranges , then sorts by those ranges at the reducers. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html https://stackoverflow.com/questions/34495981/difference-between-cluster-by-and-clustered-by-in-hive -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 18 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Manish Maheshwari Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 08 Nov 2022 02:29:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 18: (2 comments) Thanks very much. http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@21 PS17, Line 21: th > nit: "the" Done http://gerrit.cloudera.org:8080/#/c/19055/17//COMMIT_MSG@27 PS17, Line 27: drop > nit: dropping Done -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 18 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 02 Nov 2022 12:14:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])] INTO 24 BUCKETS] Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS; Instructions: 1. The bucket partitioning algorithm is the hash function used in Hive's bucketed tables; 2. Create Bucketed Table statements currently don't support Kudu and Iceberg tables; 3. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; 4. Support dropping bucketed table; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M tests/metadata/test_show_create_table.py 18 files changed, 380 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/18 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 18 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 17: (12 comments) Hi Quanlong, thanks for your review and comments. I have fixed your comments. When testing 'show-create-table', I found a bug, and fixed it, and added support for bucketed table deletion. http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@21 PS16, Line 21: he hash functi > nit: "the hash function used in Hive's bucketed tables" Done http://gerrit.cloudera.org:8080/#/c/19055/16//COMMIT_MSG@22 PS16, Line 22: > nit: currently don't Done http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@155 PS16, Line 155: tion > nit: "type" ? Maybe that makes it easier to understand: 'Data distribution method of bucketed table.' http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@194 PS16, Line 194: 2: optional i64 total_file_bytes : } > nit: The variable names are clear enough. We can simplify the comment to so Done http://gerrit.cloudera.org:8080/#/c/19055/16/common/thrift/CatalogObjects.thrift@497 PS16, Line 497: optional TValidWriteIdList > nit: "Bucket information for HDFS tables" Done http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@404 PS16, Line 404: isBucketableFormat() { > nit: it'd be better to rename it to something like "isBucketableFormat" Great. http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@756 PS16, Line 756: yzeBucketColumns(options_.bucketInfo, getColumnNames(), > nit: we can skip this check since it's done in the following method. Done http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/analysis/TableDef.java@778 PS16, Line 778: "'%s'", options_.fileFormat)); : } : if (bucketInfo.getNum_bucket() <= 0) { : > nit: kudu is checked in isSupportBucketedTable(). Do we still need this che Done http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java File fe/src/main/java/org/apache/impala/util/BucketUtils.java: http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@20 PS16, Line 20: import org.apache.hadoop.hive.metastore.api.StorageDescriptor; > nit: unused import Done http://gerrit.cloudera.org:8080/#/c/19055/16/fe/src/main/java/org/apache/impala/util/BucketUtils.java@31 PS16, Line 31: mStorageDescriptor(StorageDe > nit: "StorageDescriptor of the HMS table" Done http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test File testdata/workloads/functional-query/queries/QueryTest/create-table.test: http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/create-table.test@349 PS16, Line 349: RESULTS: VERIFY_IS_SUBSET > Can we add the rows of "Num Buckets" and "Bucket Columns" ? Done http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test: http://gerrit.cloudera.org:8080/#/c/19055/16/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@1013 PS16, Line 1013: 'engine.hive.enabled'='true', 'table_type'='ICEBERG', 'write.merge.mode'='copy-on-write') > Could you also add a test for bucket table in this file? Done -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 17 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 02 Nov 2022 11:35:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])] INTO 24 BUCKETS] Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS; Instructions: 1. The bucket partitioning algorithm is he hash function used in Hive's bucketed tables; 2. Create Bucketed Table statements currently don't support Kudu and Iceberg tables; 3. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; 4. Support drop bucketed table; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M tests/metadata/test_show_create_table.py 18 files changed, 380 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/17 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 17 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])] INTO 24 BUCKETS] Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS; Instructions: 1. The bucket partitioning algorithm is a hash of Hive; 2. Create Bucketed Table statements that do not support Kudu and Iceberg tables; 3. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 14 files changed, 353 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/16 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 16 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])] INTO 24 BUCKETS] Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS; Instructions: 1. The bucket partitioning algorithm is a hash of Hive; 2. Create Bucketed Table statements that do not support Kudu and Iceberg tables; 3. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 14 files changed, 353 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/15 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 15 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY (column[, column ...]) [SORT BY (column[, column ...])] INTO 24 BUCKETS] Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) SORT BY (s) INTO 24 BUCKETS; Instructions: 1. The bucket partitioning algorithm is a hash of Hive; 2. Create Bucketed Table statements that do not support Kudu and Iceberg tables; 3. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 14 files changed, 352 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/14 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 14: (1 comment) > Patch Set 13: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636 PS10, Line 1636: ; > Yeah, this is just for table creation. For adding write support, we can sup Done -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 28 Oct 2022 09:36:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636 PS10, Line 1636: :} > We already support the SortBy clause. Currently it's independent with the C OK, I'm going to do that. Before, I was thinking about adding the syntax is simple, but the logic we need to implement inserts and queries is more complex. -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 28 Oct 2022 03:22:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29 PS10, Line 29: > I see. Previously, CLUSTERED is identified as an IDENTIFIER. Now we define Wow, I was puzzled for a long time, thanks very much. http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19055/11//COMMIT_MSG@19 PS11, Line 19: : > Is RANDOM actually useful in practise? Could you share some use cases? No, isn't. And the random ensures an even distribution of the data, but do not apply bucket_join. Don't worry about that. As communicated, only one hash algorithm is supported. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636 PS10, Line 1636: :} > I see. I checked the Hive parser and realized that in HiveQL the SortBy cla Yes, i think so. But it was originally intended that later versions would add sortby, because this increases the complexity of the implementation. This should be achieved in the future. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705 PS10, Line 1705: {: RESULT = TableDataLayout.createKuduPartitionedLayout(partition_params); :} > This hasn't been addressed. Done -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 Oct 2022 09:27:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [CLUSTERED BY ([column [, column ...]]) INTO 24 BUCKETS Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) CLUSTERED BY (i); Instructions: 1. The bucket partitioning algorithm is a hash of Hive; 2. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 3. Create Bucketed Table statements that do not support Kudu and Iceberg tables; 4. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 14 files changed, 350 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/13 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [BUCKETED BY ([column [, column ...]]) INTO 24 BUCKETS Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) BUCKETED BY (i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) BUCKETED BY (i); Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm is a hash of Hive; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 14 files changed, 349 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/12 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 12 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,240 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/14 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 14 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,239 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/13 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 13 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,240 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/12 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 12 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. Patch Set 11: (9 comments) http://gerrit.cloudera.org:8080/#/c/18731/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18731/11//COMMIT_MSG@13 PS11, Line 13: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; > Is there a specific reason for not supporting cross joins? cross join was rarely used, and the optimization was not verified. Spark does not support cross join by default, and Presto/Trino optimizes to eliminate cross join. So We are not going to consider cross join. http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift@600 PS11, Line 600: // Whether to enable pushdown under non-equi predicates, default is false > missing line Got. http://gerrit.cloudera.org:8080/#/c/18731/11/common/thrift/Query.thrift@600 PS11, Line 600: // Whether to enable pushdown under non-equi predicates, default is false : 149: optional bool enable_none_equal_predicate_push_down = false; > What is the reason for keeping it as default false? My understanding is tha This change will change the execution plan, and I am worried that it may affect the results. Moreover, the default value is true, which will lead to the rewriting of many existing ut test data, which may introduce other problems. Therefore, I want to default to false first, and then change it to true after a period of time http://gerrit.cloudera.org:8080/#/c/18731/11/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/18731/11/fe/src/main/java/org/apache/impala/analysis/Expr.java@1933 PS11, Line 1933: weight > What does removing weight means here? This is a misstatement. What I want to say is, remove duplicates I will fix it. http://gerrit.cloudera.org:8080/#/c/18731/9/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/18731/9/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@791 PS9, Line 791: binaryPredicate = new BinaryPredicate(GE, slotBinding, minValue); > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test File testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test: http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@5 PS11, Line 5: testtbl > I think that it would be better to use a non-empty table like functional.al Yes, that's right. I will add relevant tests. Thanks. http://gerrit.cloudera.org:8080/#/c/18731/11/testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test@7 PS11, Line 7: 2 > I only see numbers in the tests - does this optimization work with other ty This optimization is valid for expr in LiteralExpr. I didn't add another type of test. I will add relevant tests. Thanks. http://gerrit.cloudera.org:8080/#/c/18731/9/tests/query_test/test_none_equi_predicate_pushdown.py File tests/query_test/test_none_equi_predicate_pushdown.py: http://gerrit.cloudera.org:8080/#/c/18731/9/tests/query_test/test_none_equi_predicate_pushdown.py@23 PS9, Line 23: > flake8: E302 expected 2 blank lines, found 1 Done http://gerrit.cloudera.org:8080/#/c/18731/11/tests/query_test/test_none_equi_predicate_pushdown.py File tests/query_test/test_none_equi_predicate_pushdown.py: http://gerrit.cloudera.org:8080/#/c/18731/11/tests/query_test/test_none_equi_predicate_pushdown.py@32 PS11, Line 32: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN > Can you also run to same test case with ENABLE_NONE_EQUAL_PREDICATE_PUSH_DO OK, I will add it. -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 25 Oct 2022 04:06:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. Patch Set 11: (9 comments) http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@18 PS10, Line 18: CREATE TABLE tbl (i int COMMENT 'hello', s string) > It'd be better if we can highlight this line since it's the only new part. Got. http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@29 PS10, Line 29: the hash partition is equivalent to a bucket, > Do you mean "CLUSTERED" has been used as a hint so we can't use it as a key If add supported "CLUSTERED" in cup file, execute SQL with CLUTERED in hint and an error will be reported.I.E. execute sql - "create /* +CLUSTERED */ test as select * from tpcds.item;", error messge: ` Query: create /* +CLUSTERED */ test as select * from tpcds.item Query submitted at: 2022-10-24 09:09:52 (Coordinator: http://d403ca04eda0:25000) ERROR: ParseException: Syntax error in line 1: create /* +CLUSTERED */ test ^ Encountered: CLUSTERED Expected: STRAIGHT_JOIN, COMMA, IDENTIFIER CAUSED BY: Exception: Syntax error ` http://gerrit.cloudera.org:8080/#/c/19055/10//COMMIT_MSG@30 PS10, Line 30: and the optimization rule applies to join query; > Are these recognized by Hive? i.e. if Hive inserts data into the table, is If HASH is used, the behavior is the same as hive. If not, the hive behavior is incompatible with the Hive behavior. If Hive inserts data into the table, it's considered a HASH, which is what we expect. Multiple bucket hash functions are used because hive's bucket hash algorithm is different from kudu's bucket hash algorithm. To be compatible with bucket join optimization in kudu table, multiple bucket hash functions are used. In other words, the kudu table is not supported in HASH mode. Using KUDU_HASH, however, results in tabular forms not being recognized by computing engines other than impala. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1636 PS10, Line 1636: | opt_bucket_desc:bucket > I think we don't need two switches here. Like other optional fields, we can Yes, but adding empty to opt_bucket_desc causes a compilation error. So I took this approach. Or, can you give me some advice? ` Warning : *** Reduce/Reduce conflict found in state #1587 between opt_bucket_desc ::= (*) and opt_sort_cols ::= (*) under symbols: {} Resolved in favor of the first production. Warning : *** Shift/Reduce conflict found in state #1587 between opt_bucket_desc ::= (*) and opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list RPAREN and opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN and opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list RPAREN under symbol KW_SORT Resolved in favor of shifting. Warning : *** Shift/Reduce conflict found in state #1587 between opt_sort_cols ::= (*) and opt_sort_cols ::= (*) KW_SORT KW_BY KW_ZORDER LPAREN opt_ident_list RPAREN and opt_sort_cols ::= (*) KW_SORT KW_BY LPAREN opt_ident_list RPAREN and opt_sort_cols ::= (*) KW_SORT KW_BY KW_LEXICAL LPAREN opt_ident_list RPAREN under symbol KW_SORT Resolved in favor of shifting. ` http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/cup/sql-parser.cup@1705 PS10, Line 1705: {: RESULT = new Pair, TSortingOrder>(null, TSortingOrder.LEXICAL); :} > nit: Let's skip reformatting unrelated codes. I Got. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex File fe/src/main/jflex/sql-scanner.flex: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/main/jflex/sql-scanner.flex@178 PS10, Line 178: kudu_has > The commit message mentions "kudu_hash". Done http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2844 PS10, Line 2844: ber mu > Can we add tests for "HASH" and "KUDUHASH" without the parentheses? Yes, I can. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@a4347 PS10, Line 4347: > Can we keep this since this still doesn't work? Yes, we can keep this since. This was taken off when I tried clustered. http://gerrit.cloudera.org:8080/#/c/19055/10/fe/src/test/java/org/apache/impala/analysis/ParserTest.java@3074 PS10, Line 3074: P
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax in the create table statement is as follows: [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS Example: CREATE TABLE tbl (i int COMMENT 'hello', s string) BUCKETED BY HASH(i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) BUCKETED BY KUDU_HASH(i) INTO 24 BUCKETS; CREATE TABLE tbl (i int COMMENT 'hello', s string) BUCKETED BY RANDOM INTO 24 BUCKETS; Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 439 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/11 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19143 ) Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax .. Patch Set 3: (3 comments) I fixed it. Thanks for your review and comments. http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml File docs/topics/impala_alter_view.xml: http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml@73 PS2, Line 73: 'name' = ' Could you change this to the following format? Done http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_alter_view.xml@76 PS2, Line 76: 'nam > Please wrap this with and single quotes. Done http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_create_view.xml File docs/topics/impala_create_view.xml: http://gerrit.cloudera.org:8080/#/c/19143/2/docs/topics/impala_create_view.xml@64 PS2, Line 64: 'name' = '< > Please wrap the two variables with and single quotes. Done -- To view, visit http://gerrit.cloudera.org:8080/19143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec Gerrit-Change-Number: 19143 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shajini Thayasingh Gerrit-Comment-Date: Mon, 17 Oct 2022 12:32:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax
Baike Xia has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/19143 ) Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax .. IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax Update document for [ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ] and [ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax. Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec --- M docs/topics/impala_alter_view.xml M docs/topics/impala_create_view.xml 2 files changed, 28 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/3 -- To view, visit http://gerrit.cloudera.org:8080/19143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec Gerrit-Change-Number: 19143 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Shajini Thayasingh
[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19143 ) Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax .. Patch Set 2: Hi Quanlong, In https://gerrit.cloudera.org/c/18940/ without updating the document, this commit is updating the document content. Please help to review it. Thanks. -- To view, visit http://gerrit.cloudera.org:8080/19143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec Gerrit-Change-Number: 19143 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 14 Oct 2022 10:45:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax
Baike Xia has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19143 ) Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax .. IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax Update document for [ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ] and [ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax. Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec --- M docs/topics/impala_alter_view.xml M docs/topics/impala_create_view.xml 2 files changed, 28 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/2 -- To view, visit http://gerrit.cloudera.org:8080/19143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec Gerrit-Change-Number: 19143 Gerrit-PatchSet: 2 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax
Baike Xia has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19143 Change subject: IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax .. IMPALA-11420: [DOCS] Document CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES syntax Update document for [ CREATE VIEW ... TBLPROPERTIES ('key' = 'value', ...) ] and [ ALTER VIEW view_name SET/UNSET TBLPROPERTIES... ] syntax. Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec --- M docs/topics/impala_alter_view.xml M docs/topics/impala_create_view.xml 2 files changed, 29 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/19143/1 -- To view, visit http://gerrit.cloudera.org:8080/19143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ief1d6bb525ba85a58b8123a0cb712d83523daaec Gerrit-Change-Number: 19143 Gerrit-PatchSet: 1 Gerrit-Owner: Baike Xia
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 420 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/10 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 10 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] [IMPALA-11625] Support create/drop materialized view syntax on IMPALA
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/19050 ) Change subject: [IMPALA-11625] Support create/drop materialized view syntax on IMPALA .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@124 PS6, Line 124: MATERIALIZED_VIEW This one doesn't seem to be in use. Can we delete it? http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/19050/6/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@645 PS6, Line 645: if (view.getMetaStoreTable().getTableType().equals(MATERIALIZED_VIEW.toString())) That doesn't seem right. enum cannot be compared with string to obtain true. Should write it this way? if (view.getMetaStoreTable().getTableType().name().equals(MATERIALIZED_VIEW.toString())) -- To view, visit http://gerrit.cloudera.org:8080/19050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77fdd34bf04a8994a215170747249356cd40622b Gerrit-Change-Number: 19050 Gerrit-PatchSet: 7 Gerrit-Owner: pengdou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 13 Oct 2022 12:22:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18862 ) Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN .. IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN Pushdown LIMIT through UNION ALL: Transforms: - Limit - Union - relation1 - relation2 .. Into: - Limit - Union - Limit - relation1 - Limit - relation2 .. Pushdown LIMIT through LEFT/RIGHT OUTER JOIN: Transforms: - Limit - Join - left source - right source Into: - Limit - Join - Limit (present if Join is left outer) - left source - Limit (present if Join is right outer) - right source Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/joins.test A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test A testdata/workloads/functional-query/queries/limit-pushdown-union.test A tests/query_test/test_limit_pushdown.py M tests/query_test/test_observability.py 14 files changed, 467 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/7 -- To view, visit http://gerrit.cloudera.org:8080/18862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c Gerrit-Change-Number: 18862 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 415 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/9 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 9 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table
Baike Xia has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu/iceberg table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu/iceberg table. This patch try to add such semantics for kudu/iceberg table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 6 files changed, 154 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/10 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 10 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18862 ) Change subject: IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN .. IMPALA-11485: Pushdown LIMIT through UNION ALL and LEFT/RIGHT OUTER JOIN Pushdown LIMIT through UNION ALL: Transforms: - Limit - Union - relation1 - relation2 .. Into: - Limit - Union - Limit - relation1 - Limit - relation2 .. Pushdown LIMIT through LEFT/RIGHT OUTER JOIN: Transforms: - Limit - Join - left source - right source Into: - Limit - Join - Limit (present if Join is left outer) - left source - Limit (present if Join is right outer) - right source Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/joins.test A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test A testdata/workloads/functional-query/queries/limit-pushdown-union.test A tests/query_test/test_limit_pushdown.py M tests/query_test/test_observability.py 13 files changed, 452 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/6 -- To view, visit http://gerrit.cloudera.org:8080/18862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c Gerrit-Change-Number: 18862 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18987 ) Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up .. Patch Set 4: (3 comments) > Patch Set 3: > > (4 comments) http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h File be/src/exec/exec-node-thread-cache.h: http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@37 PS3, Line 37: boost::unordered_map regex_cache_key_map_; > map internally is a BST while unordered_map is a hash table, since we care OK. http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@40 PS3, Line 40: // 0 indicates the initial status; : // 1 indicates successful matching; : // 2 indicates failure matching; : // 3 indicates null; > how about using an enum to represent these states? These states are only used internally and are not exposed to the outside world, so there is no need to use enumerations. http://gerrit.cloudera.org:8080/#/c/18987/3/be/src/exec/exec-node-thread-cache.h@76 PS3, Line 76: int cache_status = regex_cache_[cache_id]; > how about directly using the string `cache_key` to get the cache value? you In the current implementation, result state can be saved(init, successful, failure, null), and as of now, callers only have LikePredicate. The complexity of using unordered_map is O(1). So I thought, do we need to change it? -- To view, visit http://gerrit.cloudera.org:8080/18987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b Gerrit-Change-Number: 18987 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Comment-Date: Mon, 10 Oct 2022 06:35:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up
Baike Xia has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18987 ) Change subject: IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up .. IMPALA-11564: For Agg/Scan nodes, increase the Cache of regular expressions to speed up Each time the RE matches, the query from the cache will speed up the computation. Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b --- M be/src/exec/aggregator.cc M be/src/exec/aggregator.h A be/src/exec/exec-node-thread-cache.h M be/src/exec/grouping-aggregator-ir.cc M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu/kudu-scanner.cc M be/src/exec/kudu/kudu-scanner.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exprs/agg-fn-evaluator.h M be/src/exprs/like-predicate-ir.cc M be/src/exprs/like-predicate.cc M be/src/exprs/like-predicate.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/scalar-expr-evaluator.h M be/src/udf/udf-internal.h M be/src/udf/udf-ir.cc M be/src/udf/udf.cc M be/src/udf/udf.h A testdata/workloads/functional-query/queries/QueryTest/thread-cache.test M tests/query_test/test_queries.py 22 files changed, 219 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/18987/4 -- To view, visit http://gerrit.cloudera.org:8080/18987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68f37303aee4b6a28e560f27548c31472b82048b Gerrit-Change-Number: 18987 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18862 ) Change subject: IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN .. Patch Set 5: > Patch Set 4: > > (2 comments) Yes, you are right. I changed it. -- To view, visit http://gerrit.cloudera.org:8080/18862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c Gerrit-Change-Number: 18862 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 09 Oct 2022 11:05:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18862 ) Change subject: IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN .. IMPALA-11485: Pushdown LIMIT through UNION and LEFT/RIGHT OUTER JOIN Pushdown LIMIT through UNION: Transforms: - Limit - Union - relation1 - relation2 .. Into: - Limit - Union - Limit - relation1 - Limit - relation2 .. Pushdown LIMIT through LEFT/RIGHT OUTER JOIN: Transforms: - Limit - Join - left source - right source Into: - Limit - Join - Limit (present if Join is left outer) - left source - Limit (present if Join is right outer) - right source Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-outer-join.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-union.test M testdata/workloads/functional-planner/queries/PlannerTest/topn.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/joins.test A testdata/workloads/functional-query/queries/limit-pushdown-outer-join.test A testdata/workloads/functional-query/queries/limit-pushdown-union.test A tests/query_test/test_limit_pushdown.py M tests/query_test/test_observability.py 13 files changed, 340 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/18862/5 -- To view, visit http://gerrit.cloudera.org:8080/18862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5d040c0a98e60639d7ce4b25ecf07a859c8a32c Gerrit-Change-Number: 18862 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18958 ) Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/18958/1/be/src/exprs/timezone_db.cc File be/src/exprs/timezone_db.cc: http://gerrit.cloudera.org:8080/#/c/18958/1/be/src/exprs/timezone_db.cc@183 PS1, Line 183: if (result.rfind("#", 0) == 0) continue; > Hi all, what if line start with '\t#'? or even line is 'ZONE=UTC # some com I thnik this should be defined as the wrong way of writing, and this should be a small probability event. -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Xiang Yang Gerrit-Comment-Date: Sun, 09 Oct 2022 09:33:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@508 PS8, Line 508: hasColumnsAd > nit: "hasColumnsAdded" might be better Done. http://gerrit.cloudera.org:8080/#/c/18953/8/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test: http://gerrit.cloudera.org:8080/#/c/18953/8/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@374 PS8, Line 374: ID, NAME, VALI, NEW_COL1, NEW_COL2, NEW_COL3, NEW_COL4, NEW_COL5 > Could you add some tests on DESCRIBE between these ALTER statements? Done. -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 9 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 09 Oct 2022 03:38:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table
Baike Xia has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu table. This patch try to add such semantics for kudu table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 3 files changed, 92 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/9 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 9 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Hello Jian Zhang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18958 to look at the new patch set (#4). Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone For /etc/sysconfig/clock, when a row has a '#', we can skip that row, and optimize the content of parsing lines. This will fix the parsing problem caused by the '# Zone="utc"'. Note: The erase() function modifies the original string instead of creating a new string. The substr() function returns a new string with the specified characters instead of modifying the original string. Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 --- M be/src/exprs/timezone_db.cc 1 file changed, 6 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/4 -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18958 ) Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/18958/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18958/2//COMMIT_MSG@7 PS2, Line 7: Optimized > Won't the behavior change if we bump to a commented out zone line? e.g. Oh, Yes, this will fix the situation, this resolves to the content after '#' before fixing. Thanks. -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Comment-Date: Sun, 09 Oct 2022 02:58:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table .. Patch Set 8: (1 comment) > Patch Set 8: Verified+1 http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java: http://gerrit.cloudera.org:8080/#/c/18953/8/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java@87 PS8, Line 87: throw new AnalysisException("Duplicate column name: " + colName); > should we also ignore this error when `if not exists` is presented? It's already there, and It works for this situation 'alter table tbl add if not exists columns (b int, b int);', but this might be debatable if the field types were different. e.g.: alter table tbl add if not exists columns (b int, b string); -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 09 Oct 2022 02:27:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone
Baike Xia has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18958 ) Change subject: IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone .. IMPALA-11563: Optimized /etc/sysconfig/clock to find the time zone For /etc/sysconfig/clock, when a row has a '#', we can skip that row, and optimize the content of parsing lines. Note: The erase() function modifies the original string instead of creating a new string. The substr() function returns a new string with the specified characters instead of modifying the original string. Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 --- M be/src/exprs/timezone_db.cc 1 file changed, 6 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/18958/3 -- To view, visit http://gerrit.cloudera.org:8080/18958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f80fd1817d072f8dadf288025cb9534191ca458 Gerrit-Change-Number: 18958 Gerrit-PatchSet: 3 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 413 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/8 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table
Baike Xia has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table .. IMPALA-11565: Support IF NOT EXISTS in alter table add columns for kudu table Impala already supports IF NOT EXISTS in alter table add columns for general hive table in IMPALA-7832, but not for kudu table. This patch try to add such semantics for kudu table. Testing: - Updated E2E DDL tests Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 3 files changed, 64 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/8 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 8 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 411 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/7 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 411 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/6 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19055 ) Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 15 files changed, 408 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/5 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-3119: DDL support for bucketed tables
Baike Xia has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19055 Change subject: IMPALA-3119: DDL support for bucketed tables .. IMPALA-3119: DDL support for bucketed tables Add syntactic support for creating bucketed table. The specific syntax is as follows: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name( col_name data_type [constraint_specification] [COMMENT 'col_comment'] [, ...] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] [BUCKETED BY HASH([column [, column ...]])|RANDOM INTO 24 BUCKETS [SORT BY ([column [, column ...]])] [COMMENT 'table_comment'] [ROW FORMAT row_format] [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)] [STORED AS file_format] [LOCATION 'hdfs_path'] [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED] [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] Instructions: 1. CLUSTERED BY of Hive is not supported, because HINT has the keyword; 2. The bucket partitioning algorithm contains HASH, RANDOM, KUDU_HASH. The default value is HASH; 3. INTO 24 BUCKETS, specifies the number of buckets, the default value is 16; 4. Create Bucketed Table statements that do not support Kudu and Iceberg tables, but for a Kudu table, the hash partition is equivalent to a bucket, and the optimization rule applies to join query; 5. In the current version, alter operations(add/drop/change/replace columns) on bucketed tables are not supported; This COMMIT is the first subtask of IMPALA-3118. Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e --- M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/BucketUtils.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/create-table.test 16 files changed, 411 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19055/4 -- To view, visit http://gerrit.cloudera.org:8080/19055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I919b4d4139bc3a7784fa6fdb6f064e25666d548e Gerrit-Change-Number: 19055 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia
[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
Baike Xia has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/18940 ) Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES .. IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES Add TBLPROPERTIES support to the view, here are some examples: CREATE VIEW [IF NOT EXISTS] [database_name.]view_name [(column_name [COMMENT 'column_comment'][, ...])] [COMMENT 'view_comment'] [TBLPROPERTIES (property_name = property_value, ...)] AS select_statement; ALTER VIEW [database_name.]view_name SET TBLPROPERTIES (property_name = property_value, ...); ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES (property_name, ...); Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/views-ddl.test 14 files changed, 441 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/12 -- To view, visit http://gerrit.cloudera.org:8080/18940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 Gerrit-Change-Number: 18940 Gerrit-PatchSet: 12 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18940 ) Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES .. IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES Add TBLPROPERTIES support to the view, here are some examples: CREATE VIEW [IF NOT EXISTS] [database_name.]view_name [(column_name [COMMENT 'column_comment'][, ...])] [COMMENT 'view_comment'] [TBLPROPERTIES (property_name = property_value, ...)] AS select_statement; ALTER VIEW [database_name.]view_name SET TBLPROPERTIES (property_name = property_value, ...); ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES (property_name, ...); Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java 13 files changed, 372 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/11 -- To view, visit http://gerrit.cloudera.org:8080/18940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 Gerrit-Change-Number: 18940 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES
Baike Xia has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18940 ) Change subject: IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES .. IMPALA-11420: Support CREATE/ALTER VIEW SET/UNSET TBLPROPERTIES Add TBLPROPERTIES support to the view, here are some examples: CREATE VIEW [IF NOT EXISTS] [database_name.]view_name [(column_name [COMMENT 'column_comment'][, ...])] [COMMENT 'view_comment'] [TBLPROPERTIES (property_name = property_value, ...)] AS select_statement; ALTER VIEW [database_name.]view_name SET TBLPROPERTIES (property_name = property_value, ...); ALTER VIEW [database_name.]view_name UNSET TBLPROPERTIES (property_name, ...); Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterViewSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java A fe/src/main/java/org/apache/impala/analysis/AlterViewUnSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java 13 files changed, 371 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/18940/10 -- To view, visit http://gerrit.cloudera.org:8080/18940 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8d05bb4ec1f70f5387bb21fbe23f62c05941af18 Gerrit-Change-Number: 18940 Gerrit-PatchSet: 10 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized
Baike Xia has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: For alter table, add column operation is optimized .. IMPALA-11565: For alter table, add column operation is optimized For alter table, add if not exists column operation, if the columns already exist and are of the same type, no operation is performed; If the type is different, an error is reported. Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 4 files changed, 40 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/7 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 7 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE
Baike Xia has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18731 ) Change subject: IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE .. IMPALA-11424: Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE. For pushdown of Join non-equi conjuncts, the current qualifications: 1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN; 2. For non-equi predicates containing literalExpr, for example: slot >= Literal, slot in Literal list; 3. Push down the predicate for a complex filter condition that contains only one column. For example, cast(A as int) > 10 to push down to SCAN. 4. Currently only the associated predicate operation type is: EQ,LE,LT,GE,GT; 5. Currently only the associated predicate: BinaryPredicate and InPredicate; Pushdown logic: 1. Get the mapping relationship between slot and non-equi conjunct list, and get the mapping relationship between slot and equi conjunct list; 2. For the case where there are equal and non-equi conjuncts in the slot at the same time, calculate the maximum and minimum values of the equi conjuncts; 3. The maximum and minimum values are newly built into binaryPredicate according to non-equi conjunct; 4. Push all binaryPredicates down to a specific scan node; And add new query option as a function switch: ENABLE_NONE_EQUAL_PREDICATE_PUSH_DOWN Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/none-equal-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/none-equal-predicate-push-down.test A tests/query_test/test_none_equi_predicate_pushdown.py 10 files changed, 1,132 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18731/11 -- To view, visit http://gerrit.cloudera.org:8080/18731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3ce23cbd7522a209c830504f329b972d67bc263 Gerrit-Change-Number: 18731 Gerrit-PatchSet: 11 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized
Baike Xia has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: For alter table, add column operation is optimized .. IMPALA-11565: For alter table, add column operation is optimized For alter table, add if not exists column operation, if the columns already exist and are of the same type, no operation is performed; If the type is different, an error is reported. Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 4 files changed, 40 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/6 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 6 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized
Baike Xia has posted comments on this change. ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: For alter table, add column operation is optimized .. Patch Set 5: (2 comments) I made new a fix. http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java: http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java@87 PS3, Line 87: if (col != null) { : if (!ifNotExists) { : throw new AnalysisException("Column already exists: " + colName); : } : : // handle the case that ifNotExists is true : if (!col.getType().equals(c.getType())) { : throw new AnalysisException(String.format("Error adding column %s " + : "from table %s: type not match", colName, t.getName())); : } : : > how about simplifying the precondition check to the following to improve th Yeah, this is better. Thx. http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/18953/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1029 PS3, Line 1029: // In AlterTableAddColsStmt, there is remove for columns. : // May cause columns to be empty > could you add a coment about the reason for adding this validation, is it a I added a comment. In AlterTableAddColsStmt, there is remove for columns. May cause columns to be empty. -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Comment-Date: Mon, 26 Sep 2022 07:54:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized
Baike Xia has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: For alter table, add column operation is optimized .. IMPALA-11565: For alter table, add column operation is optimized For alter table, add if not exists column operation, if the columns already exist and are of the same type, no operation is performed; If the type is different, an error is reported. Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 4 files changed, 40 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/5 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 5 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang
[Impala-ASF-CR] IMPALA-11565: For alter table, add column operation is optimized
Baike Xia has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18953 ) Change subject: IMPALA-11565: For alter table, add column operation is optimized .. IMPALA-11565: For alter table, add column operation is optimized For alter table, add if not exists column operation, if the columns already exist and are of the same type, no operation is performed; If the type is different, an error is reported. Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 --- M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/AlterTableAddColsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 4 files changed, 40 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18953/4 -- To view, visit http://gerrit.cloudera.org:8080/18953 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I82590e5372e881f2e81d4ed3dd0d32a2d3ddb517 Gerrit-Change-Number: 18953 Gerrit-PatchSet: 4 Gerrit-Owner: Baike Xia Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang