[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 9 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Sep 2023 15:31:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files, which have to be merged on read. So far INSERT OVERWRITE (rewriting the table with itself) has been used to compact Iceberg tables. However, it comes with some RESTRICTIONS: - The table should not have multiple partition specs/partition evolution. - The table should not contain complex types. The OPTIMIZE statement offers a new syntax and a solution limited to Iceberg tables to enhance read performance for subsequent operations. See IMPALA-12293 for details. Syntax: OPTIMIZE TABLE ; This first patch introduces the new syntax, temporarily as an alias for INSERT OVERWRITE. Note that executing OPTIMIZE TABLE requires ALL privileges. Testing: - negative tests - FE planner test - Ranger test - E2E tests Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Reviewed-on: http://gerrit.cloudera.org:8080/20405 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/authorization/test_ranger.py M tests/query_test/test_iceberg.py 14 files changed, 356 insertions(+), 11 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 10 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 9 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9773/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 9 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 26 Sep 2023 10:05:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 26 Sep 2023 00:46:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9760/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 20:17:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9756/ -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 17:56:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9756/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 13:28:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9754/ -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 13:05:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14068/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 09:09:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files, which have to be merged on read. So far INSERT OVERWRITE (rewriting the table with itself) has been used to compact Iceberg tables. However, it comes with some RESTRICTIONS: - The table should not have multiple partition specs/partition evolution. - The table should not contain complex types. The OPTIMIZE statement offers a new syntax and a solution limited to Iceberg tables to enhance read performance for subsequent operations. See IMPALA-12293 for details. Syntax: OPTIMIZE TABLE ; This first patch introduces the new syntax, temporarily as an alias for INSERT OVERWRITE. Note that executing OPTIMIZE TABLE requires ALL privileges. Testing: - negative tests - FE planner test - Ranger test - E2E tests Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/authorization/test_ranger.py M tests/query_test/test_iceberg.py 14 files changed, 356 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/8 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9754/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 8 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 25 Sep 2023 08:43:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39 PS2, Line 39: > FeTable has a function that returns TableName: getTableName() This member 'tableName_' is required anyway, since DmlStatementBase takes no parameters in its constructor, therefore 'table_' is null after the constructor. 'tableName_' is used to resolve the reference and assign the value to 'table_' in analyze() using 'analyzer.getTable(tableName_, )'. It cannot be done earlier in the constructor, because the analyzer is needed to resolve the tableref. See PS7 line 71. As I see other classes that inherit from DmlStatementBase have a similar solution. The order of resolving tablerefs is name->table. FeTable.getTableName() is the other way around and would be redundant here. http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38 PS7, Line 38: public class OptimizeStmt extends DmlStatementBase { > I have the impression that there is a lot of functionality here that we sim For now, inheritance might seem a practical solution. However, it does not express semantically the relationship between the two classes: OPTIMIZE TABLE does not aim to extend the functionality of INSERT, it only uses INSERT to execute its task, therefore the composition. See CreateTableAsSelectStmt for a similar example. INSERT OVERWRITE provides only limited functionality for rewriting Iceberg tables. It should be replaced in the long term. This patch uses InsertStmt but at the same time serves as a basis for a next milestone with no InsertStmt, that can handle use cases that INSERT OVERWRITE can not - e.g. partition evolution. Composition provides more flexibility and makes this change easier to refactor for the future changes. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 24 Sep 2023 20:20:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39 PS2, Line 39: > The TableName is returned by the parser and is needed to resolve the table FeTable has a function that returns TableName: getTableName() http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38 PS7, Line 38: public class OptimizeStmt extends DmlStatementBase { I have the impression that there is a lot of functionality here that we simply delegate to 'insertStmt_'. So I'm wondering if we could derive this class from InsertStmt instead and then we could get rid of most of the logic here. It also has 'originalTableName_' and 'targetTableName_' members too -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Sep 2023 14:12:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9738/ -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 18:04:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44 PS6, Line 44: qualified > I think 'qualified' here refers to concatenating the db name if it was not Ok, thanks. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 14:18:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 14:18:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14039/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 6 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 11:41:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 6: (1 comment) Just a nit http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44 PS6, Line 44: qualified Shouldn't it be 'modified'? -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 6 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 11:34:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9738/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 7 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 13:41:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files, which have to be merged on read. So far INSERT OVERWRITE (rewriting the table with itself) has been used to compact Iceberg tables. However, it comes with some RESTRICTIONS: - The table should not have multiple partition specs/partition evolution. - The table should not contain complex types. The OPTIMIZE statement offers a new syntax and a solution limited to Iceberg tables to enhance read performance for subsequent operations. See IMPALA-12293 for details. Syntax: OPTIMIZE TABLE ; This first patch introduces the new syntax, temporarily as an alias for INSERT OVERWRITE. Note that executing OPTIMIZE TABLE requires ALL privileges. Testing: - negative tests - FE planner test - Ranger test - E2E tests Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/authorization/test_ranger.py M tests/query_test/test_iceberg.py 13 files changed, 352 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/6 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 6 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44 PS6, Line 44: qualified > Shouldn't it be 'modified'? I think 'qualified' here refers to concatenating the db name if it was not explicitly written in cases like 'use ;' or using the default db. (ex.: tbl1 -> default.tbl1) The table name itself is not modified, only qualified. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 6 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 20 Sep 2023 11:53:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13969/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 11 Sep 2023 14:21:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files which have to be merged on read. So far INSERT OVERWRITE (rewriting the table with itself) has been used to compact Iceberg tables. However, it comes with some RESTRICTIONS: - The table should not have multiple partition specs/partition evolution. - The table should not contain complex types. The OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this problem. This patch introduces the new syntax as an alias for INSERT OVERWRITE as a temporary solution. See IMPALA-12293 for further information. Note that executing OPTIMIZE TABLE requires ALL privileges. Testing: - negative tests - FE planner test - Ranger test - E2E tests Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test M tests/query_test/test_iceberg.py 12 files changed, 353 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/3 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/20405/3/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/20405/3/tests/query_test/test_iceberg.py@1300 PS3, Line 1300: t flake8: E122 continuation line missing indentation or outdented -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 11 Sep 2023 13:55:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@22 PS2, Line 22: tables with :parttition evolution I'd emphasise this restriction a bit more in the commit message, e.g. with a "RESTRICTIONS" section. Also if there are more restrictions, would be nice to see them listed in the message. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@997 PS2, Line 997: // Optimize statement works only for Iceberg tables. I don't think this comment belongs here. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@999 PS2, Line 999: KW_OPTIMIZE opt_kw_table table_name:table Dremio uses "OPTIMIZE TABLE". I just want to be sure that the choice of the command in Impala is something agreed by other query engines too (like Hive). Would this also work both with OPTIMIZE and OPTIMIZE TABLE? Could you add test coverage for both if haven't done so already http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@31 PS2, Line 31: * Representation of an OPTIMIZE statement used to execute table maintenance tasks in I think it's beneficial for the reader to elaborate more on what maintenance tasks are possible here. It's just INSERT OVERWRITE now, but worth mentioning, and later on we can expand the comment with further functionalities. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@36 PS2, Line 36: // INSERT OVERWRITE statement I don't see any value of this comment. It would be more informative that "INSERT OVERWRITE statement that this OPTIMIZE statement is translated into" or something similar. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39 PS2, Line 39: private TableName targetTableName_; This member made me wondering that the parent class, DmlStatementBase has an FeTable member called 'table_'. Shouldn't we initialise that somehow? Then the name of the table will be in the parent class and wouldn't be needed here. I'd check other classes derived from that base class how they handle that member. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@48 PS2, Line 48: selectListItems.add(SelectListItem.createStarItem(null)); > We'll have to be careful with complex types. Normally they are not included Shouldn't we return an AnalysisException here when the table contains nested type columns? If the EXPAND... option is true, then at least the error message is relevant but if it's off, then "column number mismatch" error is not that usefule here for the users. I'd rather see a "we can't write complex types" error. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@62 PS2, Line 62: insertStmt_.analyze(analyzer); I guess the insert statement itself would also check privileges on the target table, but to have coverage I'd add a simple Ranger test to see if the user is rejected if does not have enough privileges. http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@738 PS2, Line 738: Consider using TRUNCATE and INSERT INTO to overwrite your table. If you suggest the user to use "TRUNCATE+INSERT INTO" instead of OPTIMIZE then what happens if they just run these commands on the table? Wouldn't they wipe out all of their data from the table with TRUNCATE? http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test: http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@4 PS2, Line 4: CREATE TABLE ice_optimize (i int, s string) All these tables in this test are non-partitioned tables. Could you please add some coverage for partitioned tables as well? I guess the expectation there is that we would have one file per partition after running OPTIMIZE. http:/
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 2: (9 comments) Left a few small comments, but the code looks good overall. http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@24 PS2, Line 24: - E2E: normal table, table with delete files, schema evolution We could also have a planner test with table 'functional_parquet.iceberg_partition_transforms_zorder'. It has sorting columns, so we could verify that the INSERT OVERWRITE respects the sorting properties. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38 PS2, Line 38: private QueryStmt queryStmt_ Do we need this member? We can also retrieve it from the insertStmt.getQueryStmt() http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@64 PS2, Line 64: table_ == null Shouldn't it be always non-null at this point? insertStmt.analyze() doesn't always set it? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@69 PS2, Line 69: table_ = analyzer.getTable(targetTableName_, Privilege.ALL); Is the insert stmt in a valid state at this point? I mean table_ was null and we might just set the targetTableName_ to a valid table name. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@71 PS2, Line 71: targetTableName_ = new TableName(table_.getDb().getName(), table_.getName()); Why do we need to set it here? The constructor got it as a parameter, and we already created the table ref and the insert stmt at this point. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@72 PS2, Line 72: analyzer.registerPrivReq(builder -> : builder.onTable(table_).allOf(Privilege.ALL).build()); Should we move this out from the if-else stmt? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@105 PS2, Line 105: isInsertOrCtas Now the name doesn't reflect the definition. Either rename this or keep the original definition and add call isOptimizeStmt() as well at the call sites. http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test: http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@29 PS2, Line 29: VERIFY_IS_SUBSET Because of VERIFY_IS_SUBSET, this will pass even if there are more data files in the table http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@59 PS2, Line 59: VERIFY_IS_SUBSET Because of VERIFY_IS_SUBSET, this will pass even if there are delete files in the table. Please not there can be multiple RESULTS sections, e.g.: RESULTS: VERIFY_IS_EQUAL row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/.*.0.parq','.*','','$ERASURECODE_POLICY' RESULTS: VERIFY_IS_NOT_IN row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/delete-.*parq','.*','','$ERASURECODE_POLICY' Though VERIFY_IS_EQUAL should be enough for this. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 29 Aug 2023 13:39:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@14 PS2, Line 14: Currently INSERT OVERWRITE is used as a workaround to rewrite and You could describe what the insert overwrite statemtnt is, i.e. that it overwrites the table with itself. http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@17 PS2, Line 17: OPTIMIZE Nit: "the OPTIMIZE statement"? http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@19 PS2, Line 19: This patch introduces the new syntax as an alias for INSERT OVERWRITE. Will OPTIMIZE always be an alias for INSERT OVERWRITE or is it planned to be changed later? http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@23 PS2, Line 23: t Nit: extra t. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@33 PS2, Line 33: */ It would be good to describe what kind of insert overwrite statement exactly it will be, e.g. OPTIMIZE TABLE tbl; -->> INSERT OVERWRITE TABLE tbl (SELECT * FROM tbl) http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@34 PS2, Line 34: { Nit: add space before {. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@45 PS2, Line 45: targetTable Wouldn't 'targetTableName' be better? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@48 PS2, Line 48: selectListItems.add(SelectListItem.createStarItem(null)); We'll have to be careful with complex types. Normally they are not included in "select *" queries but if EXPAND_COMPLEX_TYPES is set then they are, see https://gerrit.cloudera.org/#/c/18863. We will fail in either case: if they are not included then the number of columns won't match, if they are included we get an error because we can't write complex types. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@81 PS2, Line 81: public void reset() { 'targetTableName_' is possibly modified during analyze(). Shouldn't it also be reset to its original value (the one set in the constructor)? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@90 PS2, Line 90: @Override Nit: add an empty line before this one. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@94 PS2, Line 94: @Override Nit: add an empty line before this one. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 29 Aug 2023 12:02:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13857/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 28 Aug 2023 17:10:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 ) Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/20405/2/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/20405/2/tests/query_test/test_iceberg.py@1279 PS2, Line 1279: flake8: W292 no newline at end of file -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 28 Aug 2023 16:45:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE
Noemi Pap-Takacs has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20405 Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE .. IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE If an Iceberg table is frequently updated/written to in small batches, a lot of small files are created. This decreases read performance. Similarly, frequent row-level deletes contribute to this problem by creating delete files which have to be merged on read. Currently INSERT OVERWRITE is used as a workaround to rewrite and compact Iceberg tables. OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this problem. This patch introduces the new syntax as an alias for INSERT OVERWRITE. Testing: - FE negative test for non-Iceberg tables and tables with parttition evolution - E2E: normal table, table with delete files, schema evolution Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M fe/src/main/java/org/apache/impala/planner/PlannerContext.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test M tests/query_test/test_iceberg.py 9 files changed, 240 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/2 -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Noemi Pap-Takacs Gerrit-Reviewer: Tamas Mate Gerrit-Reviewer: Zoltan Borok-Nagy