Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> 
txnToWriteIds_ =
            :       new ConcurrentHashMap<>();
> The difficulty here is we have some DDLs advancing write id without changin
Thanks for the clarification. "the new HMS API getAllWriteEventInfo only return 
info for WRITE events" I am not sure why this is any different for DDLs v/s 
insert. Can you please provide some more details why this API does not return 
DDL writeIds?


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891
PS6, Line 2891: case
do we need a default: clause which throws a exception?


http://gerrit.cloudera.org:8080/#/c/17858/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17858/5/fe/src/main/java/org/apache/impala/catalog/Table.java@142
PS5, Line 142: volatile
is this needed?


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/Table.java@142
PS6, Line 142: volatile
not sure why we need this?


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1968
PS6, Line 1968: AllocWriteIdEvent
please add class comment for newly added classes.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2049
PS6, Line 2049: writeEventInfoList = 
client.getHiveClient().getAllWriteEventInfo(
              :             new GetAllWriteEventInfoRequest(txnId_));
a comment here giving details on what this API returns will be helpful for the 
reader.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2066
PS6, Line 2066: catalog_.removeWriteIds(txnId_);
This line must be in finally block otherwise we are leaking memory in case an 
exception is thrown in the lines above.


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2079
PS6, Line 2079: addCommittedWriteIdsAndRefreshPartitions
It looks like this method is refresh one partition at a time. Can we send all 
the partitions so that they are reloaded in parallel?


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4256
PS6, Line 4256: !(table instanceof HdfsTable)
do we need a check for MVs here?


http://gerrit.cloudera.org:8080/#/c/17858/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4271
PS6, Line 4271: // HMS doesn't set partition object's write id correctly, so we 
set
              :         // it here as workaround.
              :         long writeId = writeIds.get(it.nextIndex());
              :         Partition part = it.next();
              :         part.setWriteId(writeId);
This could lead to confusion, for instance, we may be sending the partition 
objects later to HMS and they may overwrite partition writeIds in HMS. I 
understand that it is not used today, but this could cause subtle bugs which 
would be very hard to diagnose in case we add partition level writeId in 
future. Since the goal here to just track the writeId in catalogd per 
partition, can we just add a new field in HdfsPartition for writeId_?



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Oct 2021 17:44:34 +0000
Gerrit-HasComments: Yes

Reply via email to