Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20405 )
Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE ...................................................................... Patch Set 2: (9 comments) Left a few small comments, but the code looks good overall. http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@24 PS2, Line 24: - E2E: normal table, table with delete files, schema evolution We could also have a planner test with table 'functional_parquet.iceberg_partition_transforms_zorder'. It has sorting columns, so we could verify that the INSERT OVERWRITE respects the sorting properties. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38 PS2, Line 38: private QueryStmt queryStmt_ Do we need this member? We can also retrieve it from the insertStmt.getQueryStmt() http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@64 PS2, Line 64: table_ == null Shouldn't it be always non-null at this point? insertStmt.analyze() doesn't always set it? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@69 PS2, Line 69: table_ = analyzer.getTable(targetTableName_, Privilege.ALL); Is the insert stmt in a valid state at this point? I mean table_ was null and we might just set the targetTableName_ to a valid table name. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@71 PS2, Line 71: targetTableName_ = new TableName(table_.getDb().getName(), table_.getName()); Why do we need to set it here? The constructor got it as a parameter, and we already created the table ref and the insert stmt at this point. http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@72 PS2, Line 72: analyzer.registerPrivReq(builder -> : builder.onTable(table_).allOf(Privilege.ALL).build()); Should we move this out from the if-else stmt? http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java File fe/src/main/java/org/apache/impala/planner/PlannerContext.java: http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@105 PS2, Line 105: isInsertOrCtas Now the name doesn't reflect the definition. Either rename this or keep the original definition and add call isOptimizeStmt() as well at the call sites. http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test: http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@29 PS2, Line 29: VERIFY_IS_SUBSET Because of VERIFY_IS_SUBSET, this will pass even if there are more data files in the table http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@59 PS2, Line 59: VERIFY_IS_SUBSET Because of VERIFY_IS_SUBSET, this will pass even if there are delete files in the table. Please not there can be multiple RESULTS sections, e.g.: ---- RESULTS: VERIFY_IS_EQUAL row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/.*.0.parq','.*','','$ERASURECODE_POLICY' ---- RESULTS: VERIFY_IS_NOT_IN row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/delete-.*parq','.*','','$ERASURECODE_POLICY' Though VERIFY_IS_EQUAL should be enough for this. -- To view, visit http://gerrit.cloudera.org:8080/20405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7 Gerrit-Change-Number: 20405 Gerrit-PatchSet: 2 Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 29 Aug 2023 13:39:02 +0000 Gerrit-HasComments: Yes