[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Sep 2023 15:31:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE ;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Reviewed-on: http://gerrit.cloudera.org:8080/20405
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
14 files changed, 356 insertions(+), 11 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 10
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9773/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 9
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:18:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-26 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Sep 2023 10:05:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 26 Sep 2023 00:46:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9760/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 20:17:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9756/


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 17:56:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9756/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 13:28:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9754/


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 13:05:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14068/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 09:09:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE ;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
14 files changed, 356 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/8
--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9754/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 8
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 25 Sep 2023 08:43:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-24 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39
PS2, Line 39:
> FeTable has a function that returns TableName: getTableName()
This member 'tableName_' is required anyway, since DmlStatementBase takes no 
parameters in its constructor, therefore 'table_' is null after the 
constructor. 'tableName_' is used to resolve the reference and assign the value 
to 'table_' in analyze() using 'analyzer.getTable(tableName_, )'. It 
cannot be done earlier in the constructor, because the analyzer is needed to 
resolve the tableref. See PS7 line 71.
As I see other classes that inherit from DmlStatementBase have a similar 
solution. The order of resolving tablerefs is name->table. 
FeTable.getTableName() is the other way around and would be redundant here.


http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38
PS7, Line 38: public class OptimizeStmt extends DmlStatementBase {
> I have the impression that there is a lot of functionality here that we sim
For now, inheritance might seem a practical solution. However, it does not 
express semantically the relationship between the two classes: OPTIMIZE TABLE 
does not aim to extend the functionality of INSERT, it only uses INSERT to 
execute its task, therefore the composition. See CreateTableAsSelectStmt for a 
similar example. INSERT OVERWRITE provides only limited functionality for 
rewriting Iceberg tables. It should be replaced in the long term.
This patch uses InsertStmt but at the same time serves as a basis for a next 
milestone with no InsertStmt, that can handle use cases that INSERT OVERWRITE 
can not - e.g. partition evolution. Composition provides more flexibility and 
makes this change easier to refactor for the future changes.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sun, 24 Sep 2023 20:20:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-21 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39
PS2, Line 39:
> The TableName is returned by the parser and is needed to resolve the table
FeTable has a function that returns TableName: getTableName()


http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/7/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38
PS7, Line 38: public class OptimizeStmt extends DmlStatementBase {
I have the impression that there is a lot of functionality here that we simply 
delegate to 'insertStmt_'. So I'm wondering if we could derive this class from 
InsertStmt instead and then we could get rid of most of the logic here. It also 
has 'originalTableName_' and 'targetTableName_' members too



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 21 Sep 2023 14:12:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9738/


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 18:04:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44
PS6, Line 44: qualified
> I think 'qualified' here refers to concatenating the db name if it was not
Ok, thanks.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 14:18:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 14:18:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14039/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 6
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 11:41:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 6:

(1 comment)

Just a nit

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44
PS6, Line 44: qualified
Shouldn't it be 'modified'?



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 6
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 11:34:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9738/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 7
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 13:41:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files, which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and a solution limited to
Iceberg tables to enhance read performance for subsequent operations.
See IMPALA-12293 for details.

Syntax: OPTIMIZE TABLE ;

This first patch introduces the new syntax, temporarily as an alias
for INSERT OVERWRITE.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
13 files changed, 352 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/6
--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 6
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-20 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/6/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@44
PS6, Line 44: qualified
> Shouldn't it be 'modified'?
I think 'qualified' here refers to concatenating the db name if it was not 
explicitly written in cases like 'use ;' or using the default db. 
(ex.: tbl1 -> default.tbl1)
The table name itself is not modified, only qualified.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 6
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 20 Sep 2023 11:53:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13969/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 11 Sep 2023 14:21:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-11 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files which have to be merged on read.

So far INSERT OVERWRITE (rewriting the table with itself) has been used
to compact Iceberg tables.
However, it comes with some RESTRICTIONS:
- The table should not have multiple partition specs/partition evolution.
- The table should not contain complex types.

The OPTIMIZE statement offers a new syntax and an Iceberg specific
solution to this problem.

This patch introduces the new syntax as an alias for INSERT OVERWRITE
as a temporary solution. See IMPALA-12293 for further information.

Note that executing OPTIMIZE TABLE requires ALL privileges.

Testing:
 - negative tests
 - FE planner test
 - Ranger test
 - E2E tests

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by-zorder.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M tests/query_test/test_iceberg.py
12 files changed, 353 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/3
--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20405/3/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20405/3/tests/query_test/test_iceberg.py@1300
PS3, Line 1300: t
flake8: E122 continuation line missing indentation or outdented



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 11 Sep 2023 13:55:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-29 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@22
PS2, Line 22: tables with
:parttition evolution
I'd emphasise this restriction a bit more in the commit message, e.g. with a 
"RESTRICTIONS" section. Also if there are more restrictions, would be nice to 
see them listed in the message.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@997
PS2, Line 997: // Optimize statement works only for Iceberg tables.
I don't think this comment belongs here.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/cup/sql-parser.cup@999
PS2, Line 999:   KW_OPTIMIZE opt_kw_table table_name:table
Dremio uses "OPTIMIZE TABLE". I just want to be sure that the choice of the 
command in Impala is something agreed by other query engines too (like Hive).
Would this also work both with OPTIMIZE and OPTIMIZE TABLE? Could you add test 
coverage for both if haven't done so already


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@31
PS2, Line 31:  * Representation of an OPTIMIZE statement used to execute table 
maintenance tasks in
I think it's beneficial for the reader to elaborate more on what maintenance 
tasks are possible here. It's just INSERT OVERWRITE now, but worth mentioning, 
and later on we can expand the comment with further functionalities.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@36
PS2, Line 36:   // INSERT OVERWRITE statement
I don't see any value of this comment. It would be more informative that 
"INSERT OVERWRITE statement that this OPTIMIZE statement is translated into" or 
something similar.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@39
PS2, Line 39:   private TableName targetTableName_;
This member made me wondering that the parent class, DmlStatementBase has an 
FeTable member called 'table_'. Shouldn't we initialise that somehow? Then the 
name of the table will be in the parent class and wouldn't be needed here. I'd 
check other classes derived from that base class how they handle that member.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@48
PS2, Line 48: selectListItems.add(SelectListItem.createStarItem(null));
> We'll have to be careful with complex types. Normally they are not included
Shouldn't we return an AnalysisException here when the table contains nested 
type columns? If the EXPAND... option is true, then at least the error message 
is relevant but if it's off, then "column number mismatch" error is not that 
usefule here for the users. I'd rather see a "we can't write complex types" 
error.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@62
PS2, Line 62: insertStmt_.analyze(analyzer);
I guess the insert statement itself would also check privileges on the target 
table, but to have coverage I'd add a simple Ranger test to see if the user is 
rejected if does not have enough privileges.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@738
PS2, Line 738: Consider using TRUNCATE and INSERT INTO to overwrite your table.
If you suggest the user to use "TRUNCATE+INSERT INTO" instead of OPTIMIZE then 
what happens if they just run these commands on the table? Wouldn't they wipe 
out all of their data from the table with TRUNCATE?


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@4
PS2, Line 4: CREATE TABLE ice_optimize (i int, s string)
All these tables in this test are non-partitioned tables. Could you please add 
some coverage for partitioned tables as well? I guess the expectation there is 
that we would have one file per partition after running OPTIMIZE.


http:/

[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-29 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 2:

(9 comments)

Left a few small comments, but the code looks good overall.

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@24
PS2, Line 24:  - E2E: normal table, table with delete files, schema evolution
We could also have a planner test with table 
'functional_parquet.iceberg_partition_transforms_zorder'. It has sorting 
columns, so we could verify that the INSERT OVERWRITE respects the sorting 
properties.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@38
PS2, Line 38: private QueryStmt queryStmt_
Do we need this member? We can also retrieve it from the 
insertStmt.getQueryStmt()


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@64
PS2, Line 64: table_ == null
Shouldn't it be always non-null at this point? insertStmt.analyze() doesn't 
always set it?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@69
PS2, Line 69:   table_ = analyzer.getTable(targetTableName_, Privilege.ALL);
Is the insert stmt in a valid state at this point? I mean table_ was null and 
we might just set the targetTableName_ to a valid table name.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@71
PS2, Line 71:   targetTableName_ = new TableName(table_.getDb().getName(), 
table_.getName());
Why do we need to set it here? The constructor got it as a parameter, and we 
already created the table ref and the insert stmt at this point.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@72
PS2, Line 72:   analyzer.registerPrivReq(builder ->
:   builder.onTable(table_).allOf(Privilege.ALL).build());
Should we move this out from the if-else stmt?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@105
PS2, Line 105: isInsertOrCtas
Now the name doesn't reflect the definition. Either rename this or keep the 
original definition and add call isOptimizeStmt() as well at the call sites.


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test:

http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@29
PS2, Line 29: VERIFY_IS_SUBSET
Because of VERIFY_IS_SUBSET, this will pass even if there are more data files 
in the table


http://gerrit.cloudera.org:8080/#/c/20405/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test@59
PS2, Line 59: VERIFY_IS_SUBSET
Because of VERIFY_IS_SUBSET, this will pass even if there are delete files in 
the table.

Please not there can be multiple RESULTS sections, e.g.:

  RESULTS: VERIFY_IS_EQUAL
 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/.*.0.parq','.*','','$ERASURECODE_POLICY'
  RESULTS: VERIFY_IS_NOT_IN
 
row_regex:'$NAMENODE/test-warehouse/$DATABASE.db/ice_optimize/data/delete-.*parq','.*','','$ERASURECODE_POLICY'

Though VERIFY_IS_EQUAL should be enough for this.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 29 Aug 2023 13:39:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-29 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@14
PS2, Line 14: Currently INSERT OVERWRITE is used as a workaround to rewrite and
You could describe what the insert overwrite statemtnt is, i.e. that it 
overwrites the table with itself.


http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@17
PS2, Line 17: OPTIMIZE
Nit: "the OPTIMIZE statement"?


http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@19
PS2, Line 19: This patch introduces the new syntax as an alias for INSERT 
OVERWRITE.
Will OPTIMIZE always be an alias for INSERT OVERWRITE or is it planned to be 
changed later?


http://gerrit.cloudera.org:8080/#/c/20405/2//COMMIT_MSG@23
PS2, Line 23: t
Nit: extra t.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
File fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java:

http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@33
PS2, Line 33:  */
It would be good to describe what kind of insert overwrite statement exactly it 
will be, e.g.

OPTIMIZE TABLE tbl;
-->>
INSERT OVERWRITE TABLE tbl (SELECT * FROM tbl)


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@34
PS2, Line 34: {
Nit: add space before {.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@45
PS2, Line 45: targetTable
Wouldn't 'targetTableName' be better?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@48
PS2, Line 48: selectListItems.add(SelectListItem.createStarItem(null));
We'll have to be careful with complex types. Normally they are not included in 
"select *" queries but if EXPAND_COMPLEX_TYPES is set then they are, see 
https://gerrit.cloudera.org/#/c/18863.
We will fail in either case: if they are not included then the number of 
columns won't match, if they are included we get an error because we can't 
write complex types.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@81
PS2, Line 81:   public void reset() {
'targetTableName_' is possibly modified during analyze(). Shouldn't it also be 
reset to its original value (the one set in the constructor)?


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@90
PS2, Line 90:   @Override
Nit: add an empty line before this one.


http://gerrit.cloudera.org:8080/#/c/20405/2/fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java@94
PS2, Line 94:   @Override
Nit: add an empty line before this one.



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 29 Aug 2023 12:02:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13857/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 Aug 2023 17:10:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20405 )

Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20405/2/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/20405/2/tests/query_test/test_iceberg.py@1279
PS2, Line 1279:
flake8: W292 no newline at end of file



--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 28 Aug 2023 16:45:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-08-28 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20405


Change subject: IMPALA-12406: OPTIMIZE statement as an alias for INSERT 
OVERWRITE
..

IMPALA-12406: OPTIMIZE statement as an alias for INSERT OVERWRITE

If an Iceberg table is frequently updated/written to in small batches,
a lot of small files are created. This decreases read performance.
Similarly, frequent row-level deletes contribute to this problem
by creating delete files which have to be merged on read.

Currently INSERT OVERWRITE is used as a workaround to rewrite and
compact Iceberg tables.

OPTIMIZE statement offers a new syntax and an Iceberg specific
solution to this problem.
This patch introduces the new syntax as an alias for INSERT OVERWRITE.

Testing:
 - FE negative test for non-Iceberg tables and tables with
   parttition evolution
 - E2E: normal table, table with delete files, schema evolution

Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test
M tests/query_test/test_iceberg.py
9 files changed, 240 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/20405/2
--
To view, visit http://gerrit.cloudera.org:8080/20405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ief42537499ffe64fafdefe25c8d175539234c4e7
Gerrit-Change-Number: 20405
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy