Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15797 )
Change subject: IMPALA-9688: Support create iceberg table by impala ...................................................................... Patch Set 11: (12 comments) Thank you for working on this. It'll be a great addition to Impala. Please also add some tests where you create iceberg tables, where you SHOW CREATE Iceberg tables, etc. http://gerrit.cloudera.org:8080/#/c/15797/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15797/11//COMMIT_MSG@19 PS11, Line 19: stored as iceberg; How can you specify the underlying file format? I see that Iceberg currently supports Avro, ORC, and Parquet. Does the file format have to be the same for all partitions, or different partitions can use different file formats? http://gerrit.cloudera.org:8080/#/c/15797/11//COMMIT_MSG@11 PS11, Line 11: create table iceberg_test( : level string, : event_time string, : message string) : partition by spec( : level identity, : event_time identity : ) : stored as iceberg; I wonder how standard is this statement. E.g. can you create Iceberg tables the same way in Hive? http://gerrit.cloudera.org:8080/#/c/15797/11//COMMIT_MSG@20 PS11, Line 20: The 'identity' mean the partition type in iceberg Please elaborate a little. E.g.: 'identity' is one of Iceberg's Partition Transforms. 'identity' means that the source data values are used to create partitions. Other Partition Transforms hash/truncate/extract from the source data values. http://gerrit.cloudera.org:8080/#/c/15797/11/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/15797/11/be/src/service/query-options-test.cc@225 PS11, Line 225: nit: indentation is a bit off http://gerrit.cloudera.org:8080/#/c/15797/11/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/15797/11/common/thrift/CatalogObjects.thrift@118 PS11, Line 118: TIcebergPartitionType TIcebergPartitionTransform? or ...TransformType? http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@228 PS11, Line 228: nit: extra space http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@570 PS11, Line 570: Kudu Probably you don't want to set Kudu properties here http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java: http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java@35 PS11, Line 35: * identity dt, : * hour event_time, : * day event_time, : * month event_time based on the commit message and the CUP parser the order is the opposite, e.g. dt identity, event_time hour, ... http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java@97 PS11, Line 97: ... Maybe it'd be better to leave it empty? http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@61 PS11, Line 61: Partition Type "Partition Transform" or "Partition Transform Type" http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java@236 PS11, Line 236: case ICEBERG: Based on the spec the underlying storage files must be Avro, ORC, or Parquet. All of them are splittable. http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/15797/11/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@96 PS11, Line 96: * Build TIcebergPartitionType by iceberg PartitionField : */ : public static TIcebergPartitionType getPartitionType(PartitionField field) : throws TableLoadingException { : String type = field.transform().toString().toUpperCase(); : if ("IDENTITY".equals(type)) { : return TIcebergPartitionType.IDENTITY; : } else if ("HOUR".equals(type)) { : return TIcebergPartitionType.HOUR; : } else if ("DAY".equals(type)) { : return TIcebergPartitionType.DAY; : } else if ("MONTH".equals(type)) { : return TIcebergPartitionType.MONTH; : } else if ("YEAR".equals(type)) { : return TIcebergPartitionType.YEAR; : } else if ("BUCKET".equals(type)) { : return TIcebergPartitionType.BUCKET; : } else if ("TRUNCATE".equals(type)) { : return TIcebergPartitionType.TRUNCATE; : } else { : throw new TableLoadingException("Unsupported iceberg partition type: " + : field.transform()); : } : } You have the same function in IcebergPartitionSpec.java. Please refactor to a common place -- To view, visit http://gerrit.cloudera.org:8080/15797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d85db4c904a8c758c4cfb4f19cfbdab7e6ea284 Gerrit-Change-Number: 15797 Gerrit-PatchSet: 11 Gerrit-Owner: wangsheng <sky...@163.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com> Gerrit-Comment-Date: Mon, 04 May 2020 16:59:59 +0000 Gerrit-HasComments: Yes