Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20010 )
Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying ...................................................................... Patch Set 14: (10 comments) http://gerrit.cloudera.org:8080/#/c/20010/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20010/10//COMMIT_MSG@24 PS10, Line 24: Testing: > It was quite slow to build a table like that so I just skipped it for now. Would be nice to at least have some idea how this performs for a table that is not just few rows big. I think creating such a table shouldn't be that difficult: eg create an iceberg table partitioned by an int column and then insert into this the 'id' col from functional alltypes. You'd end up having 7300 partitions, but could create more of you do another insert with 'id + 1', and so on. Then querying the partitions metadata table could give us some clue about runtimes. http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.h File be/src/exec/iceberg-metadata/iceberg-row-reader.h: http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.h@37 PS14, Line 37: std::unordered_map<int, jobject>* j I think Peter meant const reference not pointer. We use pointer parameters for out-params usually. Giving it a secondthought, I guess similarly to the other param 'tuple_desc' pointer is fine, just make it a const ptr. http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc File be/src/exec/iceberg-metadata/iceberg-row-reader.cc: http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@103 PS14, Line 103: VLOG(0) Not 100% sure but isn't VLOG(3) is for debug logging? http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@115 PS14, Line 115: uint8_t is this uint8_t? not bool? http://gerrit.cloudera.org:8080/#/c/20010/10/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java: http://gerrit.cloudera.org:8080/#/c/20010/10/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@78 PS10, Line 78: initializes the iterators, so the {GetNext} call can start fetching the rows through : * the Iceberg Api. : */ : public void ScanMetadataTable() { : // Create and scan the metadata table : metadataTable_ = MetadataTableUtils.createMetadataTableInstance( : > ScanMetadataTable just inits the dataRowsIterator_ it does not gets the ele Even 2 occurrence is not okay. Thanks for the refactor! http://gerrit.cloudera.org:8080/#/c/20010/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/20010/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@313 PS10, Line 313: select count(b.parent_id) from functional_parquet.iceberg_query_metadata.history a > Added an extra test with this scenario. Have you added it after PS10? I don't see it in the diffs. http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test: http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@2 PS14, Line 2: schemata what is a schemata? http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@199 PS14, Line 199: false I checked this with DESCRIBE HISTORY and for me all the values in this 'is_current_ancestor' are true. Please verify the test results here as I guess some of them are off. Is this because you used int8_t when you populated bool values? http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py@2052 PS14, Line 2052: admin_client.execute("refresh authorization", user=ADMIN) Is this step needed here? http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py@2067 PS14, Line 2067: admin_client.execute("revoke select on database {0} from user {1}" You haven;t given SELECT on the database level as I see, so no need to revoke it here. -- To view, visit http://gerrit.cloudera.org:8080/20010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7 Gerrit-Change-Number: 20010 Gerrit-PatchSet: 14 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 19 Oct 2023 14:03:18 +0000 Gerrit-HasComments: Yes