Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@22
PS2, Line 22: tables with
            :    parttition evolution
I'd emphasise this restriction a bit more in the commit message, e.g. with a 
"RESTRICTIONS" section. Also if there are more restrictions, would be nice to 
see them listed in the message.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@997
PS2, Line 997: // Optimize statement works only for Iceberg tables.
I don't think this comment belongs here.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@999
PS2, Line 999:   KW_OPTIMIZE opt_kw_table table_name:table
Dremio uses "OPTIMIZE TABLE". I just want to be sure that the choice of the 
command in Impala is something agreed by other query engines too (like Hive).
Would this also work both with OPTIMIZE and OPTIMIZE TABLE? Could you add test 
coverage for both if haven't done so already


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@31
PS2, Line 31:  * Representation of an OPTIMIZE statement used to execute table 
maintenance tasks in
I think it's beneficial for the reader to elaborate more on what maintenance 
tasks are possible here. It's just INSERT OVERWRITE now, but worth mentioning, 
and later on we can expand the comment with further functionalities.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@36
PS2, Line 36:   // INSERT OVERWRITE statement
I don't see any value of this comment. It would be more informative that 
"INSERT OVERWRITE statement that this OPTIMIZE statement is translated into" or 
something similar.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39
PS2, Line 39:   private TableName targetTableName_;
This member made me wondering that the parent class, DmlStatementBase has an 
FeTable member called 'table_'. Shouldn't we initialise that somehow? Then the 
name of the table will be in the parent class and wouldn't be needed here. I'd 
check other classes derived from that base class how they handle that member.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@48
PS2, Line 48:     selectListItems.add(SelectListItem.createStarItem(null));
> We'll have to be careful with complex types. Normally they are not included
Shouldn't we return an AnalysisException here when the table contains nested 
type columns? If the EXPAND... option is true, then at least the error message 
is relevant but if it's off, then "column number mismatch" error is not that 
usefule here for the users. I'd rather see a "we can't write complex types" 
error.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@62
PS2, Line 62:     insertStmt_.analyze(analyzer);
I guess the insert statement itself would also check privileges on the target 
table, but to have coverage I'd add a simple Ranger test to see if the user is 
rejected if does not have enough privileges.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@738
PS2, Line 738: Consider using TRUNCATE and INSERT INTO to overwrite your table.
If you suggest the user to use "TRUNCATE+INSERT INTO" instead of OPTIMIZE then 
what happens if they just run these commands on the table? Wouldn't they wipe 
out all of their data from the table with TRUNCATE?


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@4
PS2, Line 4: CREATE TABLE ice_optimize (i int, s string)
All these tables in this test are non-partitioned tables. Could you please add 
some coverage for partitioned tables as well? I guess the expectation there is 
that we would have one file per partition after running OPTIMIZE.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@6
PS2, Line 6: TBLPROPERTIES ('format-version'='2');
Does the table have to be V2 so that we can run OPTIMIZE? If yes, could you 
mention this in the commit msg? If not, then could you add some test coverage? 
I wouldn't expect much difference though, just to be on the safe side.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@72
PS2, Line 72: ---- QUERY
With this test case the purpose was to see if the delete file is gone after 
running OPTIMIZE, right? Could you also VERIFY the list of files then?


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@84
PS2, Line 84: ====
What I'd also test here is, assuming that OPTIMIZE makes a new snapshot for the 
table, to check the table history for the list of snapshots. Also I think time 
travelling to the pre-optimized table should still be possible, so some test 
coverage would be nice to see that an older snapshot is still readable.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 29 Aug 2023 14:46:01 +0000
Gerrit-HasComments: Yes

Reply via email to