Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18935 )
Change subject: IMPALA-10610: Support multiple file formats in a single Iceberg Table ...................................................................... Patch Set 4: Code-Review+1 (1 comment) The change looks great! http://gerrit.cloudera.org:8080/#/c/18935/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/18935/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@56 PS4, Line 56: for (FileDescriptor fileDesc : fileDescs_) { : byte fileFormat = fileDesc.getFbFileMetadata().icebergMetadata().fileFormat(); : if (fileFormat == FbIcebergDataFileFormat.PARQUET) { : fileFormats_.add(HdfsFileFormat.PARQUET); : } else if (fileFormat == FbIcebergDataFileFormat.ORC) { : fileFormats_.add(HdfsFileFormat.ORC); : } else if (fileFormat == FbIcebergDataFileFormat.AVRO) { : fileFormats_.add(HdfsFileFormat.AVRO); : } else { : throw new ImpalaRuntimeException(String.format( : "Invalid Iceberg file format of file: %s", fileDesc.getAbsolutePath())); : } I don't know how expensive this is for hundreds of thousands of files, but there's a straightforward way to speed it up a bit: boolean hasParquet = false; boolean hasOrc = false; boolean hasAvro = false; for (FileDescriptor fileDesc : fileDescs_) { byte fileFormat = fileDesc.getFbFileMetadata().icebergMetadata().fileFormat(); if (fileFormat == FbIcebergDataFileFormat.PARQUET) { hasParquet= true; } else if (fileFormat == FbIcebergDataFileFormat.ORC) { hasOrc = true; } else if (fileFormat == FbIcebergDataFileFormat.AVRO) { hasAvro = true; } else { throw new ImpalaRuntimeException(String.format( "Invalid Iceberg file format of file: %s", fileDesc.getAbsolutePath())); } } if (hasParquet) fileFormats_.add(HdfsFileFormat.PARQUET); if (hasOrc) fileFormats_.add(HdfsFileFormat.ORC); if (hasAvro) fileFormats_.add(HdfsFileFormat.AVRO); -- To view, visit http://gerrit.cloudera.org:8080/18935 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc816595724e8fd2c885c6664f790af61ddf5c07 Gerrit-Change-Number: 18935 Gerrit-PatchSet: 4 Gerrit-Owner: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 08 Sep 2022 12:25:16 +0000 Gerrit-HasComments: Yes