Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16545 )
Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet) ...................................................................... Patch Set 1: (5 comments) I have some high level questions and nits, I plan to do another pass on the rest of the code soon. http://gerrit.cloudera.org:8080/#/c/16545/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16545/1//COMMIT_MSG@30 PS1, Line 30: Testing: It would be good to have some kind of parallel stress testing - the main questions are: do other impalads see the writes sooner or later? Do concurrent inserts work reasonably? My understanding is that Iceberg supports concurrent inserts using optimistic locking, but I don't know where exactly the atomic operation happens that enables this. http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h File be/src/exec/parquet/hdfs-parquet-table-writer.h: http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h@225 PS1, Line 225: an Iceberg data files nit: an vs files http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/exec/parquet/hdfs-parquet-table-writer.h@226 PS1, Line 226: fill nit: fills http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/16545/1/be/src/runtime/coordinator.cc@803 PS1, Line 803: return_status = : parent_request_state_->frontend_->AppendIcebergDataFiles(appendFiles); I am wondering which is the better place to call this: here, or in CatalogOpExecutor.updateCatalog(). My first impression was that AppendIcebergDataFilesis is the "commit", which is done in updateCatalog() for ACID tables. Doing it there has the advantage + disadvantage of doing the commit while the table is locked in the catalog, which may be a problem for performance but would make it easier to reason about concurrent DML/DDL statements on the table. http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16545/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4660 PS1, Line 4660: ((HdfsTable) table).getPartitionsForNames(partitionFilesMapBeforeInsert.keySet()); nit: +4 indentation -- To view, visit http://gerrit.cloudera.org:8080/16545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e Gerrit-Change-Number: 16545 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com> Gerrit-Comment-Date: Thu, 08 Oct 2020 20:16:31 +0000 Gerrit-HasComments: Yes