Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/20010 )
Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying ...................................................................... Patch Set 5: (51 comments) Thank you for the detailed review Gabor. As discussed the scanning as redesigned and now handled by Java code. LMK your thoughts. http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@7 PS5, Line 7: metadtata > typo Done http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17 PS5, Line 17: struct column types > it's not just struct but nested types in general Done http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17 PS5, Line 17: se > typo Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@20 PS5, Line 20: #include "exec/iceberg-metadata/iceberg-metadata-table-scanner.h" > Would it help to remove this include if we had a forward declaration of Ice Revised the includes and forward declarations http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@22 PS5, Line 22: #include "runtime/runtime-state.h" : #include "util/jni-util.h" > same as above Revised the includes and forward declarations http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@41 PS5, Line 41: /// ScanNode ancestor -> ExecNode > I don't think this comment is neccessary Removed http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@42 PS5, Line 42: class IcebergMetadataScanNode : public ScanNode { > Don't you need a virtual destructor for this class? Changed this class multiple places, I don't think there is a need for a virtual destructor now. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@49 PS5, Line 49: Iceberg TableScan > What is an Iceberg 'TableScan'? I haven't found any reference in the cc fil Re-purposed the Open method and updated the comment http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@52 PS5, Line 52: /// Get next rowbatch from the table scanner > this comment doesn't add much Updated. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@55 PS5, Line 55: /// Close the Iceberg TableScan > This comment doesn't add much Updated. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@58 PS5, Line 58: Status GetCatalogTable(JNIEnv* env, jobject* jtable); > private? Updated visibility. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@60 PS5, Line 60: protected: > Are there any derived classes from this one? I haven't found any. What's th There is no need actually, updated visibility. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@61 PS5, Line 61: tuple_desc_ > nit: I think we use ' char around variable names in comments. like 'tuple_d Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@67 PS5, Line 67: metadtata > typo Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@69 PS5, Line 69: const string* metadata_table_name_; > does this have to be a pointer? isn't regular string enough? Changed both. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@73 PS5, Line 73: scoped_ptr > unique_ptr ? Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@36 PS5, Line 36: table_name_(new TTableName(pnode.tnode_->iceberg_scan_metadata_node.table_name)) { > Who deallocates this if you allocate this on the heap? Done, table_name_ is not a pointer anymore. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@37 PS5, Line 37: new string > Who deallocates this if you allocate this on the heap? Done, metadata_table_name_ is not a pointer anymore. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@43 PS5, Line 43: NULL > nullptr instead Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@44 PS5, Line 44: return Status("Failed to get/create JVM"); > merge with line above? Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@49 PS5, Line 49: NULL > nullptr instead Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@50 PS5, Line 50: return Status("Failed to get tuple descriptor."); > Merge with line above? Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55 PS5, Line 55: jobject* jtable = new jobject(); > What I'm wondering here is if the only purpose of 'jtable' here is that we Done, jtable is not a pointer anymore. No need for pointers here. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55 PS5, Line 55: jobject* jtable = new jobject(); > Who is responsible of releasing the memory allocated here? Done, jtable is not a pointer anymore. No need for pointers here. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@56 PS5, Line 56: // Move to Prepare or Init? > drop comment Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@63 PS5, Line 63: JNIEnv* env = JniUtil::GetJNIEnv(); > I'm not familiar with this JNIEnv but is this something that changes by tim JNIEnv is only valid for the current thread, so I would not move it to be a class member. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@64 PS5, Line 64: if (env == NULL) { > nullptr and merge with row below Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@67 PS5, Line 67: env > Do we have to keep pushing this JNIEnv into IcebergMetadataScanner through I restructured the JNIEnv calls a bit. Only passing it in the MaterializeRow, it is thread safe there and should save a few calls. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@79 PS5, Line 79: if (row_batch->AtCapacity() || *eos) { > Instead of 'while(true)' can't you make this the entry condition of the whi Moved the AtCapacity call, also with the Java redesign slightly changed. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h File be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@24 PS5, Line 24: #include <memory> > I think this is not used Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@39 PS5, Line 39: > nit: extra space This comment has been rewritten and moved to the scan node. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@72 PS5, Line 72: inline static jclass fe_iceberg_table_cl_ = nullptr; > I'm wondering if it would make sense to place these java references to some These have been split some are in the scan node some are in the reader. I moved them here because of Zoltan's earlier comment. I don't have a preference. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@92 PS5, Line 92: references references > nit: typo Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@113 PS5, Line 113: /// Getting enum objects are not straigthforward calls: enum -> field -> object > Frankly, I don't see how this comment is related to the line below :) With the new java solution this became outdated. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@134 PS5, Line 134: Status ReadTimeStampValue(JNIEnv* env, SlotDescriptor* slot_desc, > There are TimeStamp with TZ columns in metadata views that Impala doesn't s I couldn't find these columns, could you help which one? http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@142 PS5, Line 142: protected: > Aren't the below are 'private' too instead of protected? Updated visibility. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@143 PS5, Line 143: /// TupleDescriptor received from the ScanNode. > In general, comments like this don't add much extra value. Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@147 PS5, Line 147: veified > typo This field was moved to the Java scanner and Scan Node. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@151 PS5, Line 151: jobject* jmetadata_table_ = new jobject(); > Who deallocates this? This is not a pointer anymore. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@153 PS5, Line 153: slot > 'slots' ? Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@156 PS5, Line 156: thorugh > nit: typo These iterators were moved to the Java class. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc File be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@133 PS5, Line 133: Status IcebergMetadataTableScanner::CreateJIcebergMetadataTable(JNIEnv* env, > Can't we replace this sequence of call with one JNI call to java that gets Moved this to a Java class. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@158 PS5, Line 158: Status IcebergMetadataTableScanner::ScanMetadataTable(JNIEnv* env) { > I wonder if we can hide this call from the interface somehow. E.g. include I would not call it in GetNext to avoid mixing responsibilities of a call that Prepares the ScanNode. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@158 PS5, Line 158: ScanMetadataTable > Another thing that makes me wondering is that do we actually perform the re I was trying to create a function name that resembles the Iceberg API call. These calls were moved to the Java class as well. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@177 PS5, Line 177: Status IcebergMetadataTableScanner::CreateJAccessors(JNIEnv* env) { > Again, might be only my preference but here I'd prefer to like a single JNI This was moved to the java scanner as well. http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/service/frontend.cc@255 PS5, Line 255: return Status("Failed to get/create JVM"); > could fit into one line with the condition Done http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/util/jni-util.h@284 PS5, Line 284: static Status GetMethodID(JNIEnv* env, jclass class_ref, const char* method_str, > comment pls Done http://gerrit.cloudera.org:8080/#/c/20010/5/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/20010/5/common/thrift/PlanNodes.thrift@683 PS5, Line 683: IcebergMetadtatScanNode > typo Done http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java: http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@49 PS5, Line 49: private final static Logger LOG = LoggerFactory.getLogger(IcebergMetadataTable.class); > I don't think this is used. Done http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@35 PS5, Line 35: import org.apache.iceberg.Accessor; > I see this many new imports, however, the new code added in this file doesn Done http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@61 PS5, Line 61: // import org.apache.iceberg.io.CloseableIterable; > drop comment Done -- To view, visit http://gerrit.cloudera.org:8080/20010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7 Gerrit-Change-Number: 20010 Gerrit-PatchSet: 5 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 21 Sep 2023 10:03:27 +0000 Gerrit-HasComments: Yes