[ https://issues.apache.org/jira/browse/IMPALA-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417830#comment-17417830 ]
Vihang Karajgaonkar commented on IMPALA-10925: ---------------------------------------------- I think the problem of consecutive create and drop events is not present any more because we keep a createEventId. The redesign generalizes existing approach to keep a lastSyncedEventId instead of createEventId so that we can use a similar mechanism for ALTER events. > Improved self event detection for event processor in catalogd > -------------------------------------------------------------- > > Key: IMPALA-10925 > URL: https://issues.apache.org/jira/browse/IMPALA-10925 > Project: IMPALA > Issue Type: Epic > Components: Catalog > Reporter: Sourabh Goyal > Assignee: Sourabh Goyal > Priority: Major > > h3. Problem Statement > Impala catalogd has Events processor which polls metastore events at regular > intervals to automatically apply changes to the metadata in the catalogd. > However, the current design to detect the self-generated events (DDL/DMLs > coming from the same catalogd) have consistency problems which can cause > query failures under certain circumstances. > > h3. Current Design > The current design of self-event detection is based on adding markers to the > HMS objects which are detected when the event is received later to determine > if the event is self-generated or not. These markers constitute a serviceID > which is unique to the catalogd instance and a catalog version number which > is unique for each catalog object. When a DDL is executed, catalogd adds > these as object parameters. When the event is received, Events processor > checks the serviceID and if the catalog version of the current object with > the same name in the catalogd cache and makes a decision of whether to ignore > the event or not. > > h3. Problems with the current design > The approach is problematic under some circumstances where there are > conflicting DDLs repeated at a faster interval. For example, a sequence of > create/drop table DDLs will generate CREATE_TABLE and DROP_TABLE events. When > the events are received, it is possible that the CREATE_TABLE event is > processed because the catalogd doesn’t have the table in the catalogd cache. > h3. Proposed Solution > The main idea of the solution is to keep track of the last event id for a > given table as eventId which the catalogd has synced to in the Table object. > The events processor ignores any event whose EVENT_ID is less than or equal > to the eventId stored in the table. Once the events processor successfully > processes a given event, it updates the value of eventId in the table before > releasing the table lock. Also, any DDL or refresh operation on the catalogd > will follow the steps given below to update the event id for the table. The > solution relies on the existing locking mechanism in the catalogd to prevent > any other concurrent updates to the table (even via EventsProcessor). > > In case of database objects, we will also have a similar eventId which > represents the events on the database object (CREATE, DROP, ALTER database) > and to which the catalogd as synced to. Since there is no refresh database > command, catalogOpExecutor will only update the database eventId when there > are DDLs at the database level (e.g CREATE, DROP, ALTER database) > > cc - [~vihangk1] [~kishendas] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org