Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21871 )

Change subject: IMPALA-12861: Fix mixed file format listing for Iceberg tables
......................................................................


Patch Set 2:

(6 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs
File common/fbs/IcebergObjects.fbs:

http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs@23
PS2, Line 23:   // We add AVRO here as a future possibility.
            :   // The Iceberg spec allows AVRO data files, but currently Impala
            :   // cannot read such Iceberg tables. See IMPALA-11158.
We can remove this comment.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@195
PS2, Line 195: == FbIcebergDataFileFormat.AVRO
You could write "!= 0" which is shorter and less error-prone.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@252
PS2, Line 252:  if
Since this is not an 'else if', the 'if' should be placed to the following line.

Same goes for L254.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@524
PS2, Line 524: populateFileFormats();
Based on the method name it's weird that we populate file formats here. Maybe 
we could just move this call to init().

Though in this case populateFileFormats() might need a precondition check that 
sampledPartitions_ is filled if sampling is involved.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278
PS2, Line 278: convertFileFormats()
There can be three SCAN nodes for a single Iceberg table (if the table has 
EQ-delete files, there can be even more):

     UNION ALL
    /         \
   /           \
  /             \
 SCAN all    ANTI JOIN
 datafiles  /        BUILD
 without   /           \
 deletes  SCAN         SCAN
          datafiles    position
          with deletes  deletes

This approach sets the same file formats for all which might not be true. If 
there are no perf problems with the old method (populating fileFormats_ in the 
c'tor of IcebergScanNode), then I think we should keep that.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@867
PS2, Line 867:  == FbIcebergDataFileFormat.AVRO
You could write "!= 0" which is shorter and less error-prone.



--
To view, visit http://gerrit.cloudera.org:8080/21871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifae900914a0d255f5a4d9b8539361247dfeaad7b
Gerrit-Change-Number: 21871
Gerrit-PatchSet: 2
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 08 Oct 2024 16:37:11 +0000
Gerrit-HasComments: Yes

Reply via email to