Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19379 )
Change subject: IMPALA-11662: Improve 'refresh iceberg_tbl_on_oss' performance ...................................................................... Patch Set 5: (6 comments) Left a few comments, but the change looks great! http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@212 PS5, Line 212: fd == null I think fd cannot be null here, see my comment in IcebergFileMetadataLoader. http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@17 PS5, Line 17: package org.apache.impala.catalog; nit: please add empty line above this http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@57 PS5, Line 57: private static final int DEFAULT_MODIFICATION_TIME = 1; nit: please add comment why we need this http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@89 PS5, Line 89: LOG.warn("Unable to load Iceberg datafile '{}', " : + "because it's outside of the table location", fileStatus.getPath().toUri()); : ++loadStats_.skippedFiles; : return null; skippedFiles means the following based on its comment: "Number of files skipped from file metadata loading because the files have not changed since the last load." It is not for data files that we cannot load. Also, if a data file is listed in the Iceberg metadata then we must be able to create a file descriptor for it, otherwise we would have a partial view of the table. So I think in this case we should rather raise an exception and fail table loading. http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@124 PS5, Line 124: // Avoid the cost of directory listing on OSS service e.g. S3A, COS, OSS, etc. This comment mentions "Avoid the cost of directory listing", then there is a FileSystemUtil.listFiles() few lines under. Please add some details to the comment. http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/19379/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@113 PS5, Line 113: if (format.equals(HdfsFileFormat.ICEBERG)) { > I want to choose whether to create the metadata ourself or use LocatedFileS I see, sorry, I should have taken a closer look at IcebergFileMetadataLoader. -- To view, visit http://gerrit.cloudera.org:8080/19379 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2ee8b6b7559e6590698b46ef1d574e55ed52f9a Gerrit-Change-Number: 19379 Gerrit-PatchSet: 5 Gerrit-Owner: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 04 Jan 2023 15:12:48 +0000 Gerrit-HasComments: Yes