Yu-Wen Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17858 )

Change subject: IMPALA-10923: Fine grained table refreshing at partition level 
events for transactional tables
......................................................................


Patch Set 4:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9
PS1, Line 9:
> +1
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219
PS1, Line 219:   97: required bool 
hms_event_incremental_refresh_transactional_table
> What is really the reason of having a config for this? Is there a case wher
The initial thought was just to toggle on/off the feature for easily doing 
experiments. From the perspective of users, they would like to turn this off 
only when this feature has problems. Therefore, the goal is to make this 
feature robust enough and then we can get rid of this flag.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97
PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> 
txnToWriteIds_ =
            :       new ConcurrentHashMap<>();
> Do we really need this? Based on my understanding when the ALLOC_WRITE_ID e
The difficulty here is we have some DDLs advancing write id without changing 
data but the new HMS API getAllWriteEventInfo only return info for WRITE 
events. Let's say we have a DDL for table foo in txn 3 and this DDL allocates 
write id 3 for table foo. We can mark write id 3 as open for table foo when 
catalogd receives AllocWriteIdEvent. However, when it receives CommitTxnEvent 
for txn 3, we don't know write id 3 for table foo is associated with this 
transaction if we don't have a mapping table in catalog.

We cannot reload writeIdList alone either for commitTxnEvent because chances 
are that there are other committed txn after this event and simply reloading 
wrietIdList make the table become inconsistent. Any thoughts or alternative 
approaches?

Sorry that my previous patch was incomplete. The entry for a transaction should 
be deleted whenever the transaction is ended (committed or aborted).


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@777
PS1, Line 777:   }
             :
             :   public void removeWriteIds(Long txnId) {
             :     Preconditions.checkNotNull(txnId);
             :     txnToWriteIds_.remove(txnId);
             :   }
             : }
> this could be simplified as
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@788
PS1, Line 788:
> do we need to check for existence of txnId?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604
PS1, Line 2604:     return true;
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610
PS1, Line 2610:    */
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578
PS1, Line 3578:
> pls add java doc
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3584
PS1, Line 3584:  {
> this can throw a NPE since one of the conditins for this if is tbl==null.
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3597
PS1, Line 3597:
> change to debug?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3599
PS1, Line 3599: ibleForTesting
> This preconditions check is unnecessary. Also, use use something like Unloc
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2802
PS1, Line 2802: ddTimer(CAT
> Please add java doc for this.
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2804
PS1, Line 2804: me
> nit, we can change this to a simple switch-case statement to reduce the if
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2805
PS1, Line 2805:
> Looks like we either need a preconditions check that validWriteIds_ is not
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java
File fe/src/main/java/org/apache/impala/catalog/TableWriteId.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@24
PS1, Line 24: ame_;
> pls append _ to the fields according to the code style. Also add a one line
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@29
PS1, Line 29:   public TableWriteId(String dbName, String tblName, Long 
createEventId, Long writeId) {
> We are probably expecting tableName to be fullyQualified but not enforcing
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@20
PS1, Line 20: import com.google.common.annotations.VisibleForTesting;
> nit: remove it if not being used?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@184
PS1, Line 184:           MetastoreEventType.from(event.getEventType());
> this seems incomplete. Are we missing anything here? We didn't create objs
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1709
PS1, Line 1709:         try {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1711
PS1, Line 1711: TION event");
> you probably are missing to set txnToWriteIdList_ from allocWriteIdMessage
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1712
PS1, Line 1712:         } catch (CatalogException e) {
> line too long (103 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1735
PS1, Line 1735:     }
> +1
The most important usage of msTbl_ sit in isEventProcessingDisabled(). For 
MetastoreTableEvent, we check if DISABLE_EVENT_HMS_SYNC is set for this msTbl_. 
Conceptually, AllocWriteIdEvent should be a MetastoreTableEvent but table 
object is not included in the event message. That's why I retrieve it from 
catalog.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1738
PS1, Line 1738:     public SelfEventContext getSelfEventContext() {
> txnToWriteIdList_ does not seem to be set anywhere in the event. Where are
Sorry that my previous patch was incomplete. It's fixed.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1786
PS1, Line 1786:
> nit: Add txnId in the message?
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1789
PS1, Line 1789:      * from this batch which is important since it is used to 
determined the event
> Didn't understand the purpose of this.
It might looks a little counter-intuitive. The commitTxnMessage doesn't have 
write event info by default but it has this function that helps to parse the 
writeEventInfo. With this, I can get table objects and partition objects 
easily. (It was defined as String in writeEventInfo)


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1793
PS1, Line 1793:     public long getEventId() {
> nit: add more info like txnId etc.
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1800
PS1, Line 1800:      * @param event The event under consideration to be batched 
into this event. It can
> if the tableWriteIds don't exist in catalog cache (say because of cache evi
Since there is no HMS API we can get all write ids for a transaction, we can 
only do something like invalidate metadata.


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1081
PS1, Line 1081:           partitionInsertEventInfos, msTbl.getDbName(), 
msTbl.getTableName());
> line too long (124 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1083
PS1, Line 1083:   }
> line too long (104 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2283
PS1, Line 2283:     assertNotNull(alterPartEvent);
> line too long (103 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2297
PS1, Line 2297:     assertNotNull(msTbl);
> line too long (110 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2304
PS1, Line 2304:     List<NotificationEvent> insertEvents = 
eventsProcessor_.getNextMetastoreEvents();
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3088
PS1, Line 3088:
> line too long (97 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17858
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9
Gerrit-Change-Number: 17858
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:04:43 +0000
Gerrit-HasComments: Yes

Reply via email to