[Impala-ASF-CR] IMPALA-12861: Fix mixed file format listing for Iceberg tables

Peter Rozsa (Code Review) Tue, 15 Oct 2024 02:35:23 -0700

Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21871 )


Change subject: IMPALA-12861: Fix mixed file format listing for Iceberg tables
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs
File common/fbs/IcebergObjects.fbs:

http://gerrit.cloudera.org:8080/#/c/21871/2/common/fbs/IcebergObjects.fbs@23
PS2, Line 23:   AVRO = 4
            : }
            :
> We can remove this comment.
Done


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@195
PS2, Line 195:  boolean hasAvro() { return has
> You could write "!= 0" which is shorter and less error-prone.
Done


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@252
PS2, Line 252:
> Since this is not an 'else if', the 'if' should be placed to the following
Done


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@524
PS2, Line 524:
> Based on the method name it's weird that we populate file formats here. May
Unfortunately, the count star optimization requires the fileFormats_ filled, 
and if I move up this method to the init(), then the sampling part is missing 
before gathering the file formats. It'd be better, if the 
checkSamplingAndCountStar method could be split in two, and this method should 
go between those, but I would not like to make this intrusive change.


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278
PS2, Line 278:
> There can be three SCAN nodes for a single Iceberg table (if the table has
Reverted back to the original approach


http://gerrit.cloudera.org:8080/#/c/21871/2/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@867
PS2, Line 867:
> You could write "!= 0" which is shorter and less error-prone.
Reverted back to the original approach, I also made a microbenchmark for 
bitflags, BitSet, and the current solution: the bitflag solution is 10x faster 
than the current solution, but overall, the file format population takes like 
100 ms for 2 million entries which is negligible in contrast to the Hadoop Path 
construction during file descriptor creation.



--
To view, visit http://gerrit.cloudera.org:8080/21871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifae900914a0d255f5a4d9b8539361247dfeaad7b
Gerrit-Change-Number: 21871
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 15 Oct 2024 09:30:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12861: Fix mixed file format listing for Iceberg tables

Reply via email to