Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17859 )

Change subject: IMPALA-10926: Sync db/table in catalog cache to latest HMS 
event id when performing DDL operations via catalog HMS endpoints
......................................................................


Patch Set 19:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17859/12//COMMIT_MSG@7
PS12, Line 7: IMPALA-10926: Sync db/table in catalog cache to latest HMS event 
id when performing
            : DDL operations via catalog HMS endpoints
> Ack
ping


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2672
PS12, Line 2672: per.getT
> I think the intention for this change was - currently for removeTable() api
I am not sure I fully understand. If this change is not needed for the patch, 
please remove this. It makes sense to me serialize write operations on the 
table. It is not clear to me why 2 read operations on a table should be 
serialized on the read lock for version number. The code doesn't change the 
table version number. Any concurrent table modification operation atomically 
replaces the table which is modified.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@288
PS19, Line 288: syncToLatestEventFactory_
Based on my understanding this field here is unnecessary and is complicating 
the code unnecessarily. Can we remove this field altogether? From what I looked 
at, this is used in MetastoreEventsProcessor.syncToLatestEventId() which is 
called from tableLoader and CatalogMetastoreServiceHandler.

Within the method MetastoreEventsProcessor.syncToLatestEventId() it is only 
used to get the events so ultimately, the only purpose of having this is to 
disable isSelfEvent() method for certain event types. Why can't we just do it 
in the isSelfEvent() itself (look for enable_sync_to_latest_event_on_ddls and 
if it enabled return false.)

The advantage of doing this would be we would get rid of this unnecessarily 
complicated way to initialize this factory and then inject in the Catalog from 
JniCatalog.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@316
PS19, Line 316: numLoadingThreads_
this is unused and can be removed.


http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/6/fe/src/main/java/org/apache/impala/catalog/Db.java@115
PS6, Line 115:
> The intention was - by making lastSyncedEventId_ volatile, the get and set
All the places where I see this variable getting accessed, I see that it is 
under the db lock. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@134
PS19, Line 134:     Preconditions.checkState(eventId <= lastSyncedEventId_,
              :         String.format("create event id: %s to be set for db %s 
should "
              :             + "be <= lastSyncedEvent id: %s", eventId, 
getName(), lastSyncedEventId_));
              :      */
Pls remove if not needed anymore.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Db.java@153
PS19, Line 153:     /*
              :     Preconditions.checkState(eventId >= createEventId_,
              :         String.format("last synced event id: %s to be set for 
db %s should "
              :                 + "be >= createEvent id: %s", eventId, 
getName(), createEventId_));
              :      */
pls remove if not needed.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/Table.java@186
PS12, Line 186:
> nit, add a comment on what this field represents and why it is volatile.
Thanks for adding the comment. But similar to the Db.java's volatile keyword, I 
don't see any code which is accessing this variable without a lock on the table 
object. I would prefer to not keep it volatile unless it is absolutely 
necessary since AFAIK, introducing volatile keyword will disable certain 
compiler optimizations not just for this variable but for other variables as 
well to make sure that the "happens-before" relationship is maintained. See 
https://www.baeldung.com/java-volatile#happens-before for example.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@60
PS19, Line 60: import org.apache.log4j.Logger;
please remove this


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@77
PS19, Line 77: org.slf4j.
this is not needed if you remove line 60


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@213
PS19, Line 213:     Preconditions.checkState(eventId <= lastSyncedEventId_,
              :         String.format("create event id: %s to be set for table 
%s should "
              :             + "be <= lastSyncedEvent id: %s", eventId, 
getFullName(),
              :             lastSyncedEventId_));
              :      */
remove if not needed.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@220
PS19, Line 220:     // TODO: Should we reset lastSyncedEvent Id if it is less 
than event Id?
              :     // If we don't reset it - we may start syncing table from 
an event id which
              :     // is less than create event id
the createEventId is set when the table is created, I am not sure when would it 
happen that lastSyncedEventId will be different than -1 in such a case.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/Table.java@234
PS19, Line 234:     Preconditions.checkState(eventId >= createEventId_,
              :         String.format("last synced event id: %s to be set for 
table %s "
              :                 + "should be >= createEvent id: %s", eventId, 
getFullName(),
              :             createEventId_));
              :      */
pls remove if not needed.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@170
PS12, Line 170: initMetrics
> Are you suggesting to pass metrics as null in
I was suggesting that we should not pass the metrics in which case we would use 
the MetastoreEventsProcessor's metrics object. In 
CatalogMetastoreServiceHandler, we can explicitly pass a metrics object since 
the intent there is to have separate metrics eventually. In the long run these 
metrics should be moved to table objects instead of keeping them at a 
eventsprocessor or metastoreServiceHandler's level.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java@34
PS19, Line 34: import
remove if not necessary? In fact all the changes to this file can be removed 
from the patch since I don't see any relevant code changes related to this 
patch.


http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/12/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@461
PS12, Line 461:
> Sorry I didn't understand your comment
Actually, I see why you have a new abstract method now. However, instead of 
calling it from processIfEnabled() it would be cleaner in my opinion to have 
all this logic in one single method for readability reasons. isSelfEvent() is 
already hooked with the metrics correctly so it would make sense to just modify 
that to something like below:

boolean isSelfEvent() {
  boolean canBeSkipped = false;
  if (BackendConfig.INSTANCE.enableSyncToLatestEventOnDdls()) {
  canBeSkipped = shouldSkipWhenSyncingToLatestEventId();
} else {
  canBeSkipped = catalog_.evaluateSelfEvent(getSelfEventContext()));
}

if (canBeSkipped) {
  metrics_.getCounter(MetastoreEventsProcessor.EVENTS_SKIPPED_METRIC)
    .inc(getNumberOfEvents());
  infoLog("Incremented events skipped counter to {}",
    metrics_.getCounter(MetastoreEventsProcessor.EVENTS_SKIPPED_METRIC)
      .getCount());

}
return canBeSkipped;
}


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@316
PS19, Line 316: EventFactoryForSyncToLatestEvent
Ideally would be great to get rid of this class since all it is really doing is 
to return false on isSelfEvent() for some events which can be done using the 
config check directly in MetastoreEventFactory


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@479
PS19, Line 479: BackendConfig
pls see my comment about moving this logic into isSelfEvent() so that there is 
only one place where all the self-event evaluation (both legacy and new way).


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@875
PS19, Line 875: tbl.getLastSyncedEventId() == -1
Why do we need to logically AND with this condition here?


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@371
PS19, Line 371:         if (currentEvent.isDropEvent()) {
if we are stopping at the drop event it would be weird I feel. For instance, 
after table load there would not be any table. I think it would make sense to 
keep things simple and process all the events.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@435
PS19, Line 435: if (currentEvent.isDropEvent()) {
same as previous comment.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1939
PS19, Line 1939: null
I am surprised we are not hitting NPE due to this and other similar calls below.


http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17859/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@40
PS19, Line 40: import org.apache.impala.catalog.TableLoadingMgr;
             : import org.apache.impala.catalog.events.EventFactory;
please remove the unused imports



--
To view, visit http://gerrit.cloudera.org:8080/17859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36364e401911352c4666674eb98c8d61bbaae9b9
Gerrit-Change-Number: 17859
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <kis...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <soura...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu-wen....@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 18:25:35 +0000
Gerrit-HasComments: Yes

Reply via email to