Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21061 )
Change subject: IMPALA-12610: Support reading ARRAY columns for Iceberg Metadata tables ...................................................................... Patch Set 3: (30 comments) Thanks Tamás! So far I have only reviewed the files until iceberg-row-reader.cc up until the function WriteStructSlot(). I'll come back for the rest. http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21061/3//COMMIT_MSG@13 PS3, Line 13: d Colon or comma needed before the class name. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/CMakeLists.txt File be/src/exec/iceberg-metadata/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/CMakeLists.txt@27 PS3, Line 27: iceberg-metadata-scanner.cc Nit: shouldn't we follow alphabetical ordering with these files? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@81 PS3, Line 81: Java It's no longer a Java object. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@98 PS3, Line 98: scan_prepare_timer_ Now it's used in Open(), is the name still appropriate? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55 PS3, Line 55: SCOPED_TIMER(scan_prepare_timer_); Some of the work has been moved from Prepare() to Open(), this could be mentioned in the commit message. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@46 PS3, Line 46: & In Impala we usually take mutable parameters by pointer, can we also do it here? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@50 PS3, Line 50: jobject Can't we take it as a const ref? Applies to the other jobject input parameters of the other methods too. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@51 PS3, Line 51: & See L46. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@55 PS3, Line 55: & See L46. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@58 PS3, Line 58: GetNextItem Could be GetNextArrayItem() to make it clearer that it refers to arrays. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@70 PS3, Line 70: inline static jmethodID iceberg_accessor_get_ = nullptr; Some of the jmethodID names are prefixed with the class, some are not. Is there a reason for that? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@82 PS3, Line 82: metadata Shouldn't it be "metadata table"? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@96 PS3, Line 96: jaccessors_ Does it still exist? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@113 PS3, Line 113: & See L46. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.h@115 PS3, Line 115: & See L46. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@75 PS3, Line 75: iceberg_metadata_scanner_ctor_, jtable_, jstr_metadata_table_name); Nit: add indentation. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@89 PS3, Line 89: SlotDescriptor* Can this be const? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@96 PS3, Line 96: does Nit: do. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@96 PS3, Line 96: not have their nested type's tuple available Are there also slots that have their nested type's tuple, e.g. arrays? Where are they handled? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@100 PS3, Line 100: const_cast Why do we need this const_cast? Cant 'current_type' be a const pointer? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@118 PS3, Line 118: if (!struct_slot_desc->type().IsStructType()) return Status::OK(); When does this happen? Shouldn't we instead DCHECK that the type is a struct? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@124 PS3, Line 124: IsComplexType What if 'child_slot_desc' is a collection? In InitSlotIdFieldIdMap() we only recurse if the type is struct. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@147 PS3, Line 147: find We could store the result of this call and pass the field_id to GetValueByFieldId() so that it doesn't have to retrieve it again. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@151 PS3, Line 151: ARRAY Isn't is inside a struct if isTupleOfStructSlot() is true? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-metadata-scanner.cc@195 PS3, Line 195: out << "Metadata table name: " << metadata_table_name_ << "; "; Shouldn't we add some more info in the debug string, e.g. that it is an IcebergMetadataScanner, and also something before the string of 'tuple_desc_', that it is the tuple of the metadata table? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@20 PS3, Line 20: Nit: unneeded empty line. http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@38 PS3, Line 38: s Nit: can it translate a single StructLike into multiple rows? Otherwise use singular "row". http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.h@99 PS3, Line 99: jclass JavaClassForImpalaType(const ColumnType type); Couldn't it be static? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@100 PS3, Line 100: Type not used Can you clarify this? http://gerrit.cloudera.org:8080/#/c/21061/3/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@184 PS3, Line 184: DCHECK(slot_desc != nullptr); We could also DCHECK that 'slot_desc' has struct type. -- To view, visit http://gerrit.cloudera.org:8080/21061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb9bac1822a17bd3cd074b4b98e4d010703cecb1 Gerrit-Change-Number: 21061 Gerrit-PatchSet: 3 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 26 Feb 2024 14:44:30 +0000 Gerrit-HasComments: Yes