[ 
https://issues.apache.org/jira/browse/IMPALA-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10976:
------------------------------------
    Labels: catalog-2024  (was: )

> Sync db/table in catalogd to latest HMS event id for all DDLs from Impala 
> shell
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-10976
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10976
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Frontend
>            Reporter: Sourabh Goyal
>            Assignee: Sai Hemanth Gantasala
>            Priority: Major
>              Labels: catalog-2024
>
> This is a follow up from IMPALA-10926. The idea is that when any DDL 
> operation is performed from Impala shell, it also syncs the db/table to its 
> latest event ID as per HMS. This way updates to a db/table's are applied in 
> the same order as they appear in the Notification log in HMS which ensures 
> consistency. Currently catalogD applies any updates received from Impala 
> shell in place. Instead it should perform an HMS operation first and then 
> replay all the HMS events since the last synced event.
>  However there are subtle differences in how Impala processes DDLs via shell 
> vs how it processes HMS events These are:
>  * When processing an alter table event, currently catalogD does a full table 
> reload. This has a performance impact as table reload is time consuming. 
> Whereas in place alter table DDL operation in catalogOpExecutor (via Impala 
> shell) is faster since detects when to reload table schema or file metadata 
> or both. Need some improvements in Alter table event processing logic to 
> detect whether to reload the file metadata or not. --> This is addressed by 
> IMPALA-11534
>  * Similar improvement is required in processing alter partition event. As of 
> now, when processing AlterPartition HMS event, catalogd always  reloads file 
> metadata but when doing the same from shell, it reloads metadata only when it 
> is required. 
>  * Impala shell already caches hive fns in catalog db’s object.  But catalogD 
> does *not* process CREATE/DROP Fns HMS event
>  * When creating a db/table from Impala shell, if the operation fails because 
> the db/table already exists, then there is no reliable way in catalogd to 
> determine create event id for that db/table. The create event is required so 
> that for any subsequent ddl operations, catalogd can process HMS events 
> starting from createEvent Id. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to