Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18935 )

Change subject: IMPALA-10610: Support multiple file formats in a single Iceberg 
Table
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

The change looks great!

http://gerrit.cloudera.org:8080/#/c/18935/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18935/4/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@56
PS4, Line 56:     for (FileDescriptor fileDesc : fileDescs_) {
            :       byte fileFormat = 
fileDesc.getFbFileMetadata().icebergMetadata().fileFormat();
            :       if (fileFormat == FbIcebergDataFileFormat.PARQUET) {
            :         fileFormats_.add(HdfsFileFormat.PARQUET);
            :       } else if (fileFormat == FbIcebergDataFileFormat.ORC) {
            :         fileFormats_.add(HdfsFileFormat.ORC);
            :       } else if (fileFormat == FbIcebergDataFileFormat.AVRO) {
            :         fileFormats_.add(HdfsFileFormat.AVRO);
            :       } else {
            :         throw new ImpalaRuntimeException(String.format(
            :             "Invalid Iceberg file format of file: %s", 
fileDesc.getAbsolutePath()));
            :       }
I don't know how expensive this is for hundreds of thousands of files, but 
there's a straightforward way to speed it up a bit:

    boolean hasParquet = false;
    boolean hasOrc = false;
    boolean hasAvro = false;

    for (FileDescriptor fileDesc : fileDescs_) {
      byte fileFormat = 
fileDesc.getFbFileMetadata().icebergMetadata().fileFormat();
      if (fileFormat == FbIcebergDataFileFormat.PARQUET) {
        hasParquet= true;
      } else if (fileFormat == FbIcebergDataFileFormat.ORC) {
        hasOrc = true;
      } else if (fileFormat == FbIcebergDataFileFormat.AVRO) {
        hasAvro = true;
      } else {
        throw new ImpalaRuntimeException(String.format(
            "Invalid Iceberg file format of file: %s", 
fileDesc.getAbsolutePath()));
      }
    }

    if (hasParquet) fileFormats_.add(HdfsFileFormat.PARQUET);
    if (hasOrc) fileFormats_.add(HdfsFileFormat.ORC);
    if (hasAvro) fileFormats_.add(HdfsFileFormat.AVRO);



--
To view, visit http://gerrit.cloudera.org:8080/18935
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc816595724e8fd2c885c6664f790af61ddf5c07
Gerrit-Change-Number: 18935
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Sep 2022 12:25:16 +0000
Gerrit-HasComments: Yes

Reply via email to