Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20010 )

Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying
......................................................................


Patch Set 5:

(51 comments)

Thank you for the detailed review Gabor.
As discussed the scanning as redesigned and now handled by Java code.
LMK your thoughts.

http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@7
PS5, Line 7: metadtata
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17
PS5, Line 17: struct column types
> it's not just struct but nested types in general
Done


http://gerrit.cloudera.org:8080/#/c/20010/5//COMMIT_MSG@17
PS5, Line 17: se
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@20
PS5, Line 20: #include "exec/iceberg-metadata/iceberg-metadata-table-scanner.h"
> Would it help to remove this include if we had a forward declaration of Ice
Revised the includes and forward declarations


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@22
PS5, Line 22: #include "runtime/runtime-state.h"
            : #include "util/jni-util.h"
> same as above
Revised the includes and forward declarations


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@41
PS5, Line 41: /// ScanNode ancestor -> ExecNode
> I don't think this comment is neccessary
Removed


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@42
PS5, Line 42: class IcebergMetadataScanNode : public ScanNode {
> Don't you need a virtual destructor for this class?
Changed this class multiple places, I don't think there is a need for a virtual 
destructor now.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@49
PS5, Line 49: Iceberg TableScan
> What is an Iceberg 'TableScan'? I haven't found any reference in the cc fil
Re-purposed the Open method and updated the comment


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@52
PS5, Line 52:   /// Get next rowbatch from the table scanner
> this comment doesn't add much
Updated.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@55
PS5, Line 55:   /// Close the Iceberg TableScan
> This comment doesn't add much
Updated.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@58
PS5, Line 58:   Status GetCatalogTable(JNIEnv* env, jobject* jtable);
> private?
Updated visibility.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@60
PS5, Line 60:  protected:
> Are there any derived classes from this one? I haven't found any. What's th
There is no need actually, updated visibility.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@61
PS5, Line 61: tuple_desc_
> nit: I think we use ' char around variable names in comments. like 'tuple_d
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@67
PS5, Line 67: metadtata
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@69
PS5, Line 69:   const string* metadata_table_name_;
> does this have to be a pointer? isn't regular string enough?
Changed both.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@73
PS5, Line 73: scoped_ptr
> unique_ptr ?
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@36
PS5, Line 36:     table_name_(new 
TTableName(pnode.tnode_->iceberg_scan_metadata_node.table_name)) {
> Who deallocates this if you allocate this on the heap?
Done, table_name_ is not a pointer anymore.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@37
PS5, Line 37: new string
> Who deallocates this if you allocate this on the heap?
Done, metadata_table_name_ is not a pointer anymore.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@43
PS5, Line 43: NULL
> nullptr instead
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@44
PS5, Line 44:     return Status("Failed to get/create JVM");
> merge with line above?
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@49
PS5, Line 49: NULL
> nullptr instead
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@50
PS5, Line 50:     return Status("Failed to get tuple descriptor.");
> Merge with line above?
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55
PS5, Line 55: jobject* jtable = new jobject();
> What I'm wondering here is if the only purpose of 'jtable' here is that we
Done, jtable is not a pointer anymore. No need for pointers here.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@55
PS5, Line 55:   jobject* jtable = new jobject();
> Who is responsible of releasing the memory allocated here?
Done, jtable is not a pointer anymore. No need for pointers here.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@56
PS5, Line 56: // Move to Prepare or Init?
> drop comment
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@63
PS5, Line 63:   JNIEnv* env = JniUtil::GetJNIEnv();
> I'm not familiar with this JNIEnv but is this something that changes by tim
JNIEnv is only valid for the current thread, so I would not move it to be a 
class member.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@64
PS5, Line 64:   if (env == NULL) {
> nullptr and merge with row below
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@67
PS5, Line 67: env
> Do we have to keep pushing this JNIEnv into IcebergMetadataScanner through
I restructured the JNIEnv calls a bit. Only passing it in the MaterializeRow, 
it is thread safe there and should save a few calls.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.cc@79
PS5, Line 79:     if (row_batch->AtCapacity() || *eos) {
> Instead of 'while(true)' can't you make this the entry condition of the whi
Moved the AtCapacity call, also with the Java redesign slightly changed.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h
File be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@24
PS5, Line 24: #include <memory>
> I think this is not used
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@39
PS5, Line 39:
> nit: extra space
This comment has been rewritten and moved to the scan node.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@72
PS5, Line 72:   inline static jclass fe_iceberg_table_cl_ = nullptr;
> I'm wondering if it would make sense to place these java references to some
These have been split some are in the scan node some are in the reader. I moved 
them here because of Zoltan's earlier comment. I don't have a preference.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@92
PS5, Line 92: references references
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@113
PS5, Line 113:   /// Getting enum objects are not straigthforward calls: enum 
-> field -> object
> Frankly, I don't see how this comment is related to the line below :)
With the new java solution this became outdated.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@134
PS5, Line 134:   Status ReadTimeStampValue(JNIEnv* env, SlotDescriptor* 
slot_desc,
> There are TimeStamp with TZ columns in metadata views that Impala doesn't s
I couldn't find these columns, could you help which one?


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@142
PS5, Line 142:  protected:
> Aren't the below are 'private' too instead of protected?
Updated visibility.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@143
PS5, Line 143:   /// TupleDescriptor received from the ScanNode.
> In general, comments like this don't add much extra value.
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@147
PS5, Line 147: veified
> typo
This field was moved to the Java scanner and Scan Node.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@151
PS5, Line 151:   jobject* jmetadata_table_ = new jobject();
> Who deallocates this?
This is not a pointer anymore.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@153
PS5, Line 153: slot
> 'slots' ?
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.h@156
PS5, Line 156: thorugh
> nit: typo
These iterators were moved to the Java class.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc
File be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@133
PS5, Line 133: Status 
IcebergMetadataTableScanner::CreateJIcebergMetadataTable(JNIEnv* env,
> Can't we replace this sequence of call with one JNI call to java that gets
Moved this to a Java class.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@158
PS5, Line 158: Status IcebergMetadataTableScanner::ScanMetadataTable(JNIEnv* 
env) {
> I wonder if we can hide this call from the interface somehow. E.g. include
I would not call it in GetNext to avoid mixing responsibilities of a call that 
Prepares the ScanNode.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@158
PS5, Line 158: ScanMetadataTable
> Another thing that makes me wondering is that do we actually perform the re
I was trying to create a function name that resembles the Iceberg API call. 
These calls were moved to the Java class as well.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/exec/iceberg-metadata/iceberg-metadata-table-scanner.cc@177
PS5, Line 177: Status IcebergMetadataTableScanner::CreateJAccessors(JNIEnv* 
env) {
> Again, might be only my preference but here I'd prefer to like a single JNI
This was moved to the java scanner as well.


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/service/frontend.cc@255
PS5, Line 255:     return Status("Failed to get/create JVM");
> could fit into one line with the condition
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/20010/5/be/src/util/jni-util.h@284
PS5, Line 284:   static Status GetMethodID(JNIEnv* env, jclass class_ref, const 
char* method_str,
> comment pls
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/20010/5/common/thrift/PlanNodes.thrift@683
PS5, Line 683: IcebergMetadtatScanNode
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java:

http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java@49
PS5, Line 49:   private final static Logger LOG = 
LoggerFactory.getLogger(IcebergMetadataTable.class);
> I don't think this is used.
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@35
PS5, Line 35: import org.apache.iceberg.Accessor;
> I see this many new imports, however, the new code added in this file doesn
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/service/JniFrontend.java@61
PS5, Line 61: // import org.apache.iceberg.io.CloseableIterable;
> drop comment
Done



--
To view, visit http://gerrit.cloudera.org:8080/20010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
Gerrit-Change-Number: 20010
Gerrit-PatchSet: 5
Gerrit-Owner: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Sep 2023 10:03:27 +0000
Gerrit-HasComments: Yes

Reply via email to