Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
......................................................................


Patch Set 2:

(9 comments)

Left a few small comments, but the code looks good overall.

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@24
PS2, Line 24:  - E2E: normal table, table with delete files, schema evolution
We could also have a planner test with table 
'functional_parquet.iceberg_partition_transforms_zorder'. It has sorting 
columns, so we could verify that the INSERT OVERWRITE respects the sorting 
properties.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38
PS2, Line 38: private QueryStmt queryStmt_
Do we need this member? We can also retrieve it from the 
insertStmt.getQueryStmt()


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@64
PS2, Line 64: table_ == null
Shouldn't it be always non-null at this point? insertStmt.analyze() doesn't 
always set it?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@69
PS2, Line 69:       table_ = analyzer.getTable(targetTableName_, Privilege.ALL);
Is the insert stmt in a valid state at this point? I mean table_ was null and 
we might just set the targetTableName_ to a valid table name.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@71
PS2, Line 71:       targetTableName_ = new TableName(table_.getDb().getName(), 
table_.getName());
Why do we need to set it here? The constructor got it as a parameter, and we 
already created the table ref and the insert stmt at this point.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@72
PS2, Line 72:       analyzer.registerPrivReq(builder ->
            :           builder.onTable(table_).allOf(Privilege.ALL).build());
Should we move this out from the if-else stmt?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@105
PS2, Line 105: isInsertOrCtas
Now the name doesn't reflect the definition. Either rename this or keep the 
original definition and add call isOptimizeStmt() as well at the call sites.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@29
PS2, Line 29: VERIFY_IS_SUBSET
Because of VERIFY_IS_SUBSET, this will pass even if there are more data files 
in the table


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@59
PS2, Line 59: VERIFY_IS_SUBSET
Because of VERIFY_IS_SUBSET, this will pass even if there are delete files in 
the table.

Please not there can be multiple RESULTS sections, e.g.:

 ---- RESULTS: VERIFY_IS_EQUAL
 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/.*.0.parq','.*','','$ERASURECODE_POLICY'
 ---- RESULTS: VERIFY_IS_NOT_IN
 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/delete-.*parq','.*','','$ERASURECODE_POLICY'

Though VERIFY_IS_EQUAL should be enough for this.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 29 Aug 2023 13:39:02 +0000
Gerrit-HasComments: Yes

Reply via email to