Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17858 )
Change subject: IMPALA-10923: Fine grained table refreshing at partition level events for transactional tables ...................................................................... Patch Set 4: (33 comments) http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17858/1//COMMIT_MSG@9 PS1, Line 9: > +1 Done http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift File common/thrift/BackendGflags.thrift: http://gerrit.cloudera.org:8080/#/c/17858/1/common/thrift/BackendGflags.thrift@219 PS1, Line 219: 97: required bool hms_event_incremental_refresh_transactional_table > What is really the reason of having a config for this? Is there a case wher The initial thought was just to toggle on/off the feature for easily doing experiments. From the perspective of users, they would like to turn this off only when this feature has problems. Therefore, the goal is to make this feature robust enough and then we can get rid of this flag. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@97 PS1, Line 97: protected final ConcurrentHashMap<Long, Set<TableWriteId>> txnToWriteIds_ = : new ConcurrentHashMap<>(); > Do we really need this? Based on my understanding when the ALLOC_WRITE_ID e The difficulty here is we have some DDLs advancing write id without changing data but the new HMS API getAllWriteEventInfo only return info for WRITE events. Let's say we have a DDL for table foo in txn 3 and this DDL allocates write id 3 for table foo. We can mark write id 3 as open for table foo when catalogd receives AllocWriteIdEvent. However, when it receives CommitTxnEvent for txn 3, we don't know write id 3 for table foo is associated with this transaction if we don't have a mapping table in catalog. We cannot reload writeIdList alone either for commitTxnEvent because chances are that there are other committed txn after this event and simply reloading wrietIdList make the table become inconsistent. Any thoughts or alternative approaches? Sorry that my previous patch was incomplete. The entry for a transaction should be deleted whenever the transaction is ended (committed or aborted). http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@777 PS1, Line 777: } : : public void removeWriteIds(Long txnId) { : Preconditions.checkNotNull(txnId); : txnToWriteIds_.remove(txnId); : } : } > this could be simplified as Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@788 PS1, Line 788: > do we need to check for existence of txnId? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2604 PS1, Line 2604: return true; > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2610 PS1, Line 2610: */ > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3578 PS1, Line 3578: > pls add java doc Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3584 PS1, Line 3584: { > this can throw a NPE since one of the conditins for this if is tbl==null. Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3597 PS1, Line 3597: > change to debug? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3599 PS1, Line 3599: ibleForTesting > This preconditions check is unnecessary. Also, use use something like Unloc Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2802 PS1, Line 2802: ddTimer(CAT > Please add java doc for this. Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2804 PS1, Line 2804: me > nit, we can change this to a simple switch-case statement to reduce the if Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2805 PS1, Line 2805: > Looks like we either need a preconditions check that validWriteIds_ is not Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java File fe/src/main/java/org/apache/impala/catalog/TableWriteId.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@24 PS1, Line 24: ame_; > pls append _ to the fields according to the code style. Also add a one line Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/TableWriteId.java@29 PS1, Line 29: public TableWriteId(String dbName, String tblName, Long createEventId, Long writeId) { > We are probably expecting tableName to be fullyQualified but not enforcing Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@20 PS1, Line 20: import com.google.common.annotations.VisibleForTesting; > nit: remove it if not being used? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@184 PS1, Line 184: MetastoreEventType.from(event.getEventType()); > this seems incomplete. Are we missing anything here? We didn't create objs Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1709 PS1, Line 1709: try { > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1711 PS1, Line 1711: TION event"); > you probably are missing to set txnToWriteIdList_ from allocWriteIdMessage Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1712 PS1, Line 1712: } catch (CatalogException e) { > line too long (103 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1735 PS1, Line 1735: } > +1 The most important usage of msTbl_ sit in isEventProcessingDisabled(). For MetastoreTableEvent, we check if DISABLE_EVENT_HMS_SYNC is set for this msTbl_. Conceptually, AllocWriteIdEvent should be a MetastoreTableEvent but table object is not included in the event message. That's why I retrieve it from catalog. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1738 PS1, Line 1738: public SelfEventContext getSelfEventContext() { > txnToWriteIdList_ does not seem to be set anywhere in the event. Where are Sorry that my previous patch was incomplete. It's fixed. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1786 PS1, Line 1786: > nit: Add txnId in the message? Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1789 PS1, Line 1789: * from this batch which is important since it is used to determined the event > Didn't understand the purpose of this. It might looks a little counter-intuitive. The commitTxnMessage doesn't have write event info by default but it has this function that helps to parse the writeEventInfo. With this, I can get table objects and partition objects easily. (It was defined as String in writeEventInfo) http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1793 PS1, Line 1793: public long getEventId() { > nit: add more info like txnId etc. Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1800 PS1, Line 1800: * @param event The event under consideration to be batched into this event. It can > if the tableWriteIds don't exist in catalog cache (say because of cache evi Since there is no HMS API we can get all write ids for a transaction, we can only do something like invalidate metadata. http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1081 PS1, Line 1081: partitionInsertEventInfos, msTbl.getDbName(), msTbl.getTableName()); > line too long (124 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@1083 PS1, Line 1083: } > line too long (104 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2283 PS1, Line 2283: assertNotNull(alterPartEvent); > line too long (103 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2297 PS1, Line 2297: assertNotNull(msTbl); > line too long (110 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2304 PS1, Line 2304: List<NotificationEvent> insertEvents = eventsProcessor_.getNextMetastoreEvents(); > line too long (99 > 90) Done http://gerrit.cloudera.org:8080/#/c/17858/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@3088 PS1, Line 3088: > line too long (97 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/17858 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ba07c9a338a25614690e314335ee4b801486da9 Gerrit-Change-Number: 17858 Gerrit-PatchSet: 4 Gerrit-Owner: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com> Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com> Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com> Gerrit-Comment-Date: Wed, 13 Oct 2021 21:04:43 +0000 Gerrit-HasComments: Yes