Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21061 )

Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg 
Metadata tables
......................................................................


Patch Set 3:

(30 comments)

Thanks Tamás! So far I have only reviewed the files until iceberg-row-reader.cc 
up until the function WriteStructSlot(). I'll come back for the rest.

http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG@13
PS3, Line 13: d
Colon or comma needed before the class name.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/CMakeLists.txt
File be/src/exec/iceberg-metadata/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/CMakeLists.txt@27
PS3, Line 27:   iceberg-metadata-scanner.cc
Nit: shouldn't we follow alphabetical ordering with these files?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@81
PS3, Line 81: Java
It's no longer a Java object.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@98
PS3, Line 98: scan_prepare_timer_
Now it's used in Open(), is the name still appropriate?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55
PS3, Line 55:   SCOPED_TIMER(scan_prepare_timer_);
Some of the work has been moved from Prepare() to Open(), this could be 
mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@46
PS3, Line 46: &
In Impala we usually take mutable parameters by pointer, can we also do it here?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@50
PS3, Line 50: jobject
Can't we take it as a const ref? Applies to the other jobject input parameters 
of the other methods too.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@51
PS3, Line 51: &
See L46.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@55
PS3, Line 55: &
See L46.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@58
PS3, Line 58: GetNextItem
Could be GetNextArrayItem() to make it clearer that it refers to arrays.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@70
PS3, Line 70:   inline static jmethodID iceberg_accessor_get_ = nullptr;
Some of the jmethodID names are prefixed with the class, some are not. Is there 
a reason for that?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@82
PS3, Line 82: metadata
Shouldn't it be "metadata table"?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@96
PS3, Line 96: jaccessors_
Does it still exist?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@113
PS3, Line 113: &
See L46.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@115
PS3, Line 115: &
See L46.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@75
PS3, Line 75:   iceberg_metadata_scanner_ctor_, jtable_, 
jstr_metadata_table_name);
Nit: add indentation.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@89
PS3, Line 89: SlotDescriptor*
Can this be const?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@96
PS3, Line 96: does
Nit: do.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@96
PS3, Line 96: not have their nested type's tuple available
Are there also slots that have their nested type's tuple, e.g. arrays? Where 
are they handled?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@100
PS3, Line 100: const_cast
Why do we need this const_cast? Cant 'current_type' be a const pointer?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@118
PS3, Line 118:   if (!struct_slot_desc->type().IsStructType()) return 
Status::OK();
When does this happen? Shouldn't we instead DCHECK that the type is a struct?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@124
PS3, Line 124: IsComplexType
What if 'child_slot_desc' is a collection? In InitSlotIdFieldIdMap() we only 
recurse if the type is struct.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@147
PS3, Line 147: find
We could store the result of this call and pass the field_id to 
GetValueByFieldId() so that it doesn't have to retrieve it again.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@151
PS3, Line 151: ARRAY
Isn't is inside a struct if isTupleOfStructSlot() is true?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@195
PS3, Line 195:   out << "Metadata table name: " << metadata_table_name_ << "; ";
Shouldn't we add some more info in the debug string, e.g. that it is an 
IcebergMetadataScanner, and also something before the string of 'tuple_desc_', 
that it is the tuple of the metadata table?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@20
PS3, Line 20:
Nit: unneeded empty line.


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@38
PS3, Line 38: s
Nit: can it translate a single StructLike into multiple rows? Otherwise use 
singular "row".


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@99
PS3, Line 99:   jclass JavaClassForImpalaType(const ColumnType type);
Couldn't it be static?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@100
PS3, Line 100: Type not used
Can you clarify this?


http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@184
PS3, Line 184:   DCHECK(slot_desc != nullptr);
We could also DCHECK that 'slot_desc' has struct type.



--
To view, visit http://gerrit.cloudera.org:8080/21061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1
Gerrit-Change-Number: 21061
Gerrit-PatchSet: 3
Gerrit-Owner: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2024 14:44:30 +0000
Gerrit-HasComments: Yes

Reply via email to