[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

Csaba Ringhofer (Code Review) Thu, 08 Oct 2020 13:17:00 -0700

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16545 )


Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg 
tables (Parquet)
......................................................................


Patch Set 1:

(5 comments)

I have some high level questions and nits, I plan to do another pass on the 
rest of the code soon.

http://gerrit.cloudera.org:8080/#/c/16545/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16545/1//COMMIT_MSG@30
PS1, Line 30: Testing:
It would be good to have some kind of parallel stress testing - the main 
questions are: do other impalads see the writes sooner or later? Do concurrent 
inserts work reasonably?

My understanding is that Iceberg supports concurrent inserts using optimistic 
locking, but I don't know where exactly the atomic operation happens that 
enables this.


http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h
File be/src/exec/parquet/hdfs-parquet-table-writer.h:

http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h@225
PS1, Line 225: an Iceberg data files
nit: an vs files


http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h@226
PS1, Line 226: fill
nit: fills


http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/runtime/coordinator.cc@803
PS1, Line 803:       return_status =
             :           
parent_request_state_->frontend_->AppendIcebergDataFiles(appendFiles);
I am wondering which is the better place to call this: here, or in 
CatalogOpExecutor.updateCatalog(). My first impression was that 
AppendIcebergDataFilesis is the "commit", which is done in updateCatalog() for 
ACID tables. Doing it there has the advantage + disadvantage of doing the 
commit while the table is locked in the catalog, which may be a problem for 
performance but would make it easier to reason about concurrent DML/DDL 
statements on the table.


http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4660
PS1, Line 4660:       ((HdfsTable) 
table).getPartitionsForNames(partitionFilesMapBeforeInsert.keySet());
nit: +4 indentation



--
To view, visit http://gerrit.cloudera.org:8080/16545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Gerrit-Change-Number: 16545
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: wangsheng <sky...@163.com>
Gerrit-Comment-Date: Thu, 08 Oct 2020 20:16:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

Reply via email to