[Impala-ASF-CR] IMPALA-13364: Schema resolution doesn't work for migrated partitioned Iceberg tables that have complex types

Gabor Kaszab (Code Review) Fri, 13 Sep 2024 05:07:53 -0700

Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21761 )


Change subject: IMPALA-13364: Schema resolution doesn't work for migrated 
partitioned Iceberg tables that have complex types
......................................................................


Patch Set 4:

(7 comments)

Thanks for fixing this, Zoltan!

http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc
File be/src/exec/file-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@95
PS4, Line 95: file_metadata
should we DCHECK if 'file_metadata' is not null


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@239
PS4, Line 239:   int fileFieldID = *fieldID;
'fieldId' could be null. Add a DCHECK?


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@241
PS4, Line 241:   using namespace org::apache::impala::fb;
I think that the code between L241-L269 is not related to the actual fieldId 
you gave to this function and the same code runs multiple times but would give 
the same answer. I think we could extract it somewhere else and we could pass 
this function only the result of that code block.
As I see the actual sourceId vector isn't relevant either after L271, just the 
first, last elements and the size. In fact once we verified that they are 
consecutive numbers, only the first element and the size matters.


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@248
PS4, Line 248:   for (int i = 0; i < transforms->size(); ++i) {
We don't need 'i' within the loop. Can this be a foreach?


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@251
PS4, Line 251:             
FbIcebergTransformType::FbIcebergTransformType_IDENTITY) {
nit: too much indentation


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/file-metadata-utils.cc@280
PS4, Line 280:     return Status(errorMsg);
I think we can be more specific here with the error msg. We could let the user 
know that only the subset of the partitions are written into the data file.


http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/21761/4/be/src/exec/parquet/parquet-metadata-utils.cc@1022
PS4, Line 1022:       // Partition columns are not stored in file metadata, but 
they get field IDs
I'm just thinking about the feasibility of the comment I left in 
FileMetadatUtils. So we come to this part of the code for the children of 
complex types, and only if this is a migrated, partitioned Iceberg table, right?
What if we collect such fields first, do the validations for the Partition 
fields once, and then adjust the fildIds?



--
To view, visit http://gerrit.cloudera.org:8080/21761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie32952021b63d6b55b8820489e434bfc2a91580b
Gerrit-Change-Number: 21761
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Comment-Date: Fri, 13 Sep 2024 12:07:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-13364: Schema resolution doesn't work for migrated partitioned Iceberg tables that have complex types

Reply via email to