[ 
https://issues.apache.org/jira/browse/IMPALA-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417830#comment-17417830
 ] 

Vihang Karajgaonkar commented on IMPALA-10925:
----------------------------------------------

I think the problem of consecutive create and drop events is not present any 
more because we keep a createEventId. The redesign generalizes existing 
approach to keep a lastSyncedEventId instead of createEventId so that we can 
use a similar mechanism for ALTER events.

> Improved self event detection for event processor in catalogd 
> --------------------------------------------------------------
>
>                 Key: IMPALA-10925
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10925
>             Project: IMPALA
>          Issue Type: Epic
>          Components: Catalog
>            Reporter: Sourabh Goyal
>            Assignee: Sourabh Goyal
>            Priority: Major
>
> h3. Problem Statement
> Impala catalogd has Events processor which polls metastore events at regular 
> intervals to automatically apply changes to the metadata in the catalogd. 
> However, the current design to detect the self-generated events (DDL/DMLs 
> coming from the same catalogd) have consistency problems which can cause 
> query failures under certain circumstances.
>  
> h3. Current Design
> The current design of self-event detection is based on adding markers to the 
> HMS objects which are detected when the event is received later to determine 
> if the event is self-generated or not. These markers constitute a serviceID 
> which is unique to the catalogd instance and a catalog version number which 
> is unique for each catalog object. When a DDL is executed, catalogd adds 
> these as object parameters. When the event is received, Events processor 
> checks the serviceID and if the catalog version of the current object with 
> the same name in the catalogd cache and makes a decision of whether to ignore 
> the event or not.
>  
> h3. Problems with the current design
> The approach is problematic under some circumstances where there are 
> conflicting DDLs repeated at a faster interval. For example, a sequence of 
> create/drop table DDLs will generate CREATE_TABLE and DROP_TABLE events. When 
> the events are received, it is possible that the CREATE_TABLE event is 
> processed because the catalogd doesn’t have the table in the catalogd cache. 
> h3. Proposed Solution
> The main idea of the solution is to keep track of the last event id for a 
> given table as eventId which the catalogd has synced to in the Table object. 
> The events processor ignores any event whose EVENT_ID is less than or equal 
> to the eventId stored in the table. Once the events processor successfully 
> processes a given event, it updates the value of eventId in the table before 
> releasing the table lock. Also, any DDL or refresh operation on the catalogd 
> will follow the steps given below to update the event id for the table. The 
> solution relies on the existing locking mechanism in the catalogd to prevent 
> any other concurrent updates to the table (even via EventsProcessor).
>  
> In case of database objects, we will also have a similar eventId which 
> represents the events on the database object (CREATE, DROP, ALTER database) 
> and to which the catalogd as synced to. Since there is no refresh database 
> command, catalogOpExecutor will only update the database eventId when there 
> are DDLs at the database level (e.g CREATE, DROP, ALTER database)
>  
> cc - [~vihangk1] [~kishendas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to