Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/20010 )
Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying ...................................................................... Patch Set 7: (19 comments) Thank you for the review Gabor. As discussed with the team the C++ JNI scanning has been rewritten to Java. The tests needed some additional changes as well. The metadata that were rewritten manually had some issues, because avrotools removed some extra schemata that were added by Iceberg. In this version the table is created by Impala and the tests are less specific. http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h File be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-metadata-scan-node.h@121 PS6, Line 121: /// Initializes the metadata table and executes an Iceberg scan through JNI. > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@33 PS6, Line 33: /// rows. It utilizes the provided {Accessor} objects to do this translation. > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/6/be/src/exec/iceberg-metadata/iceberg-row-reader.h@37 PS6, Line 37: IcebergRowReader(const TupleDescriptor* tuple_desc, > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java File fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java: http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@59 PS5, Line 59: analyzeTimeTravel(analyzer); > Is it possible to provide time travel spec with metadata table queries with I planned it in IMPALA-11991, added a test clause that throws an exception that it is not supported for metadata tables. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@61 PS5, Line 61: Analyzer.checkTableCapability(getTable(), Analyzer.OperationType.READ); > a ranger test would be nice to check if users with insufficient permissions Actually, I do not think we need this for virtual tables. This is not a ranger test, this is whether the table supports writing if needed. For auth check registerAuthAndAuditEvent the iceberg table is passed, I think anyone who can read the table should be able to read the metadata as well. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/analysis/IcebergMetadataTableRef.java@63 PS5, Line 63: analyzeHints(analyzer); > What hints can you provide with these queries? I guess join hints could be Added a planner test, this should work out of the box I believe. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java: http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@39 PS5, Line 39: // Metadata table name, it is stored here so it can be passed to the backend. > this comment doesn't add much value on top of what the variable name says Done http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@40 PS5, Line 40: protected final String metadataTableName_; > I see this is only used once (in toThrift()) so I wouldn't introduce a clas But the table ref is not tblRef here, so either that or the metadata table name has to be saved. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@43 PS5, Line 43: TableRef > this can even be an IcebergMetadataTableRef, right? Then a cast is needed in SingleNodePlanner. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/IcebergMetadataScanNode.java@53 PS5, Line 53: assignConjuncts(analyzer); > assignConjuncts() and computeStats() are already done by calling super.init True, for better analysis results I removed the super call. I noticed that ScanNodes call these for themselves. http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/20010/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1900 PS5, Line 1900: } else if (table instanceof IcebergMetadataTable) { > I'm trying to understand the difference between this branch and the one whe Done, I cannot recall why I IcebergMetadataTable instance check earlier. http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java: http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@34 PS6, Line 34: /** > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@36 PS6, Line 36: * API. This object is instantiated and governed by the backend > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/6/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@38 PS6, Line 38: * > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql@3702 PS5, Line 3702: hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes /test-warehouse/iceberg_test/hadoop_catalog/ice > Do you have to create this table through Hive? Done, I cannot remember why I used this. http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/datasets/functional/functional_schema_template.sql@3715 PS5, Line 3715: ---- DEPENDENT_LOAD > nit: extra empty line? Done http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@1 PS5, Line 1: # The test table for these test is created during dataload by Impala. An existing table > Can you also have some test coverage that some SQL that are valid for regul Done. http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@2 PS5, Line 2: # could not have been rewritten manually, because avrotools removes additinal schemata > All of the results in this test fits into one rowBatch. Could you have cove Done. http://gerrit.cloudera.org:8080/#/c/20010/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@311 PS5, Line 311: BIGINT > as we apparently are returning these complex types as strings would it requ x -- To view, visit http://gerrit.cloudera.org:8080/20010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7 Gerrit-Change-Number: 20010 Gerrit-PatchSet: 7 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 27 Sep 2023 11:51:04 +0000 Gerrit-HasComments: Yes