Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
......................................................................


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9
PS1, Line 9: To enable fine-grained table refreshing, there are three main 
changes in this commit.
> nit: each line should have 72 or fewer characters if possible.
+1


http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@10
PS1, Line 10:  Maintain validWriteIdList in Catalogd for transactional tables
I have expressed some concerns about the handling of the map at catalog level. 
Specifically around the cleanup. Since we are already tracking 
MutableValidWriteIdList at table level, this map seems unnecessary to me. Can 
you please provide more details in case I am missing something?


http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219
PS1, Line 219:   97: required bool incremental_refresh_acid
> nit: rename it to hms_event_incremental_refresh or hms_event_incremental_re
What is really the reason of having a config for this? Is there a case where a 
user would like to turn this off?


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> 
txnToWriteIds_ =
            :       new ConcurrentHashMap<>();
Do we really need this? Based on my understanding when the ALLOC_WRITE_ID event 
comes in, we add directly add the writeIds to the table by marking them as OPEN.

In the commit event we directly call the HMS API to fetch the writeId info for 
that trasaction and refresh the table/partitions using those writeIds

In case of AbortTxn event it looks like we just mark it in txnToWriteIds_ which 
doesn't make a lot of sense since it is not used anywhere. Also, this maps 
seems to be a ever increasing map since I don't see any code which is cleaning 
up entries from this map.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@777
PS1, Line 777:     if (txnToWriteIds_.containsKey(txnId)) {
             :       txnToWriteIds_.get(txnId).add(tableWriteId);
             :     } else {
             :       Set<TableWriteId> set = new HashSet<>();
             :       set.add(tableWriteId);
             :       txnToWriteIds_.put(txnId, set);
             :     }
this could be simplified as
txnToWriteIds_.computeIfAbsent((k)-> new HashSet<>()).add(tableWriteId);


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@788
PS1, Line 788: containsKey
do we need to check for existence of txnId?


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2582
PS1, Line 2582:       List<TPartitionKeyValue> tPartSpec, long writeId, boolean 
loadFileMetadata,
> I feel it would be cleaner if we don't add writeId and eventId as parameter
I think I would have a different opinion here. My thinking is that we should 
either reloadPartition or not atomically under the lock. If we do the checks 
before calling this method, we might be ending up with race conditions like 
createEventId changing after the passing the checks before calling the method.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2947
PS1, Line 2947: only if the writeId ist still open.
I think a more useful comment would be to tell the intent of the code doing the 
way we are doing this. I would rephrase this to.. "If the writeIdList of this 
table already has the given writeId_ as aborted or commited, we don't need to 
reload the table since the table is more recent than the given writeId."


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578
PS1, Line 3578: addWriteIdsToTable
pls add java doc


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578
PS1, Line 3578: createEventId
The general pattern here is we send the eventId and compare it with the table's 
createEventId. I think it is cleaner and easier to reason.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3584
PS1, Line 3584: tbl.getFullName(
this can throw a NPE since one of the conditins for this if is tbl==null.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3597
PS1, Line 3597: info
change to debug?


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3599
PS1, Line 3599: 
Preconditions.checkState(!versionLock_.isWriteLockedByCurrentThread());
This preconditions check is unnecessary. Also, use use something like 
UnlockWriteLockIfErronouslyLocked() here. Because we don't want to bail out of 
this method while holding the versionLock under any circumstances.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2802
PS1, Line 2802: addWriteIds
Please add java doc for this.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2804
PS1, Line 2804: if
nit, we can change this to a simple switch-case statement to reduce the if 
conditions checks.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2805
PS1, Line 2805: validWriteIds_
Looks like we either need a preconditions check that validWriteIds_ is not null 
or return if it is null early.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
File fe/src/main/java/org/apache/impala/catalog/TableWriteId.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@22
PS1, Line 22: TableWriteId
please add a class level java doc


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@24
PS1, Line 24: tableName
pls append _ to the fields according to the code style. Also add a one line 
comment for each field.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1711
PS1, Line 1711: allocWriteIdMessage
you probably are missing to set txnToWriteIdList_ from allocWriteIdMessage here.
txnToWriteIdList_ = allocWriteIdMessage.getTxnToWriteIdList();


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1718
PS1, Line 1718: msTbl_
setting this to null here is weird and unnecessary. Instead I think it would be 
cleaner if you send this event id into catalog_.addWriteIdsToTable() and return 
from it if the table's createEventId is greater this eventId.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1735
PS1, Line 1735:       if (msTbl_ == null) {
> can we move the check tbl_.getCreateEventId() > eventId_ here ? As msTbl_ i
+1


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1758
PS1, Line 1758:       if (msTbl_ == null) {
              :         return false;
              :       }
this can be removed if you remove the null logic in the constructor and perhaps 
not need to @Override the method at all.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1791
PS1, Line 1791: refreshPartitions
It is unclear to me why are we refreshing first and then adding the committed 
writeIds. I thought we would mark the writeIds as committed and then refresh 
the partitions. All this should happen atomically while holding the table lock.

Edit: It looks like we are refreshing the partitions using the information from 
the commitTxnMessage here. So perhaps you should rename this method to 
something more appropriate like refreshPartitionsFromCommitMessage()



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 20:08:24 +0000
Gerrit-HasComments: Yes

Reply via email to