Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg 
tables
......................................................................


Patch Set 1:

(6 comments)

Some high level comments.
The code looks good to me so far, but I will go through it once more.

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
This is not clear to me: if the block locations have changed between queries, 
then it makes sense to schedule them differently, but if they didn't change 
(which seems the more common case to me), then how are they different?


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25
PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't
            : handle self-events of Iceberg tables.
Can you add more information about this? For example what events are generated 
in HMS for different Iceberg operations? I guess that schema changes will be 
treated similarly to the case when Impala calls HMS directly, but I can't 
imagine what happens in case of INSERTs at the moment.


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: service ID and new catalog version for the Iceberg table
Doesn't Iceberg have something similar to catalog version, e.g. something like 
"snapshot id", that could be used to check whether we need to reload data?

The current way we handle self-events is a horrible hack IMO which is needed 
because HMS doesn't provide a usable version number for table/partition 
metadata. Just hoping that Iceberg does have something like that :)


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34
PS1, Line 34:  * added e2e test for the scan range assignment
It is a bit tricky, but self-event handling can be also tested, see 
https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L508


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104
PS1, Line 104:     properties.setProperty("external.table.purge", "TRUE");
Is this change related to the ones mentioned in the commit message?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151
PS1, Line 151:     properties.setProperty(IcebergTable.ICEBERG_CATALOG,
             :                            
tableProps.get(IcebergTable.ICEBERG_CATALOG));
Is this change related to the ones mentioned in the commit message?



--
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 15:40:32 +0000
Gerrit-HasComments: Yes

Reply via email to