Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20010 )

Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying
......................................................................


Patch Set 7:

(19 comments)

Thank you for the review Gabor.
As discussed with the team the C++ JNI scanning has been rewritten to Java.
The tests needed some additional changes as well. The metadata that were 
rewritten manually had some issues, because avrotools removed some extra 
schemata that were added by Iceberg. In this version the table is created by 
Impala and the tests are less specific.

http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h
File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@121
PS6, Line 121:   /// Initializes the metadata table and executes an Iceberg 
scan through JNI.
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@33
PS6, Line 33: /// rows. It utilizes the provided {Accessor} objects to do this 
translation.
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@37
PS6, Line 37:   IcebergRowReader(const TupleDescriptor* tuple_desc,
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java
File fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java:

http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@59
PS5, Line 59:     analyzeTimeTravel(analyzer);
> Is it possible to provide time travel spec with metadata table queries with
I planned it in IMPALA-11991, added a test clause that throws an exception that 
it is not supported for metadata tables.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@61
PS5, Line 61:     Analyzer.checkTableCapability(getTable(), 
Analyzer.OperationType.READ);
> a ranger test would be nice to check if users with insufficient permissions
Actually, I do not think we need this for virtual tables. This is not a ranger 
test, this is whether the table supports writing if needed.

For auth check registerAuthAndAuditEvent the iceberg table is passed, I think 
anyone who can read the table should be able to read the metadata as well.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@63
PS5, Line 63:     analyzeHints(analyzer);
> What hints can you provide with these queries? I guess join hints could be
Added a planner test, this should work out of the box I believe.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@39
PS5, Line 39:   // Metadata table name, it is stored here so it can be passed 
to the backend.
> this comment doesn't add much value on top of what the variable name says
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@40
PS5, Line 40:   protected final String metadataTableName_;
> I see this is only used once (in toThrift()) so I wouldn't introduce a clas
But the table ref is not tblRef here, so either that or the metadata table name 
has to be saved.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@43
PS5, Line 43: TableRef
> this can even be an IcebergMetadataTableRef, right?
Then a cast is needed in SingleNodePlanner.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@53
PS5, Line 53:     assignConjuncts(analyzer);
> assignConjuncts() and computeStats() are already done by calling super.init
True, for better analysis results I removed the super call. I noticed that 
ScanNodes call these for themselves.


http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1900
PS5, Line 1900:     } else if (table instanceof IcebergMetadataTable) {
> I'm trying to understand the difference between this branch and the one whe
Done, I cannot recall why I IcebergMetadataTable instance check earlier.


http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java:

http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@34
PS6, Line 34: /**
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@36
PS6, Line 36:  * API. This object is instantiated and governed by the backend
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@38
PS6, Line 38:  *
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql@3702
PS5, Line 3702: hadoop fs -put -f 
${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes
 /test-warehouse/iceberg_test/hadoop_catalog/ice
> Do you have to create this table through Hive?
Done, I cannot remember why I used this.


http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql@3715
PS5, Line 3715: ---- DEPENDENT_LOAD
> nit: extra empty line?
Done


http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1
PS5, Line 1: # The test table for these test is created during dataload by 
Impala. An existing table
> Can you also have some test coverage that some SQL that are valid for regul
Done.


http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@2
PS5, Line 2: # could not have been rewritten manually, because avrotools 
removes additinal schemata
> All of the results in this test fits into one rowBatch. Could you have cove
Done.


http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@311
PS5, Line 311: BIGINT
> as we apparently are returning these complex types as strings would it requ
x



--
To view, visit http://gerrit.cloudera.org:8080/20010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
Gerrit-Change-Number: 20010
Gerrit-PatchSet: 7
Gerrit-Owner: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Sep 2023 11:51:04 +0000
Gerrit-HasComments: Yes

Reply via email to