Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20010 )

Change subject: IMPALA-11996: Scanner change for Iceberg metadata querying
......................................................................


Patch Set 14:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/20010/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20010/10//COMMIT_MSG@24
PS10, Line 24: Testing:
> It was quite slow to build a table like that so I just skipped it for now.
Would be nice to at least have some idea how this performs for a table that is 
not just few rows big. I think creating such a table shouldn't be that 
difficult: eg create an iceberg table partitioned by an int column and then 
insert into this the 'id' col from functional alltypes. You'd end up having 
7300 partitions, but could create more of you do another insert with 'id + 1', 
and so on. Then querying the partitions metadata table could give us some clue 
about runtimes.


http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.h
File be/src/exec/iceberg-metadata/iceberg-row-reader.h:

http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.h@37
PS14, Line 37: std::unordered_map<int, jobject>* j
I think Peter meant const reference not pointer. We use pointer parameters for 
out-params usually.

Giving it a secondthought, I guess similarly to the other param 'tuple_desc' 
pointer is fine, just make it a const ptr.


http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc
File be/src/exec/iceberg-metadata/iceberg-row-reader.cc:

http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@103
PS14, Line 103: VLOG(0)
Not 100% sure but isn't VLOG(3) is for debug logging?


http://gerrit.cloudera.org:8080/#/c/20010/14/be/src/exec/iceberg-metadata/iceberg-row-reader.cc@115
PS14, Line 115: uint8_t
is this uint8_t? not bool?


http://gerrit.cloudera.org:8080/#/c/20010/10/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java
File fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java:

http://gerrit.cloudera.org:8080/#/c/20010/10/fe/src/main/java/org/apache/impala/util/IcebergMetadataScanner.java@78
PS10, Line 78:  initializes the iterators, so the {GetNext} call can start 
fetching the rows through
             :    * the Iceberg Api.
             :    */
             :   public void ScanMetadataTable() {
             :     // Create and scan the metadata table
             :     metadataTable_ = 
MetadataTableUtils.createMetadataTableInstance(
             :
> ScanMetadataTable just inits the dataRowsIterator_ it does not gets the ele
Even 2 occurrence is not okay. Thanks for the refactor!


http://gerrit.cloudera.org:8080/#/c/20010/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20010/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@313
PS10, Line 313: select count(b.parent_id) from 
functional_parquet.iceberg_query_metadata.history a
> Added an extra test with this scenario.
Have you added it after PS10? I don't see it in the diffs.


http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test:

http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@2
PS14, Line 2: schemata
what is a schemata?


http://gerrit.cloudera.org:8080/#/c/20010/14/testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test@199
PS14, Line 199: false
I checked this with DESCRIBE HISTORY and for me all the values in this 
'is_current_ancestor' are true. Please verify the test results here as I guess 
some of them are off. Is this because you used int8_t when you populated bool 
values?


http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py@2052
PS14, Line 2052:       admin_client.execute("refresh authorization", user=ADMIN)
Is this step needed here?


http://gerrit.cloudera.org:8080/#/c/20010/14/tests/authorization/test_ranger.py@2067
PS14, Line 2067:       admin_client.execute("revoke select on database {0} from 
user {1}"
You haven;t given SELECT on the database level as I see, so no need to revoke 
it here.



--
To view, visit http://gerrit.cloudera.org:8080/20010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0e943cecd77f5ef7af7cd07e2b596f2c5b4331e7
Gerrit-Change-Number: 20010
Gerrit-PatchSet: 14
Gerrit-Owner: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Anonymous Coward <lipeng...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <g.furnst...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Oct 2023 14:03:18 +0000
Gerrit-HasComments: Yes

Reply via email to