[ https://issues.apache.org/jira/browse/IMPALA-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-10976: ------------------------------------ Labels: catalog-2024 (was: ) > Sync db/table in catalogd to latest HMS event id for all DDLs from Impala > shell > ------------------------------------------------------------------------------- > > Key: IMPALA-10976 > URL: https://issues.apache.org/jira/browse/IMPALA-10976 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Frontend > Reporter: Sourabh Goyal > Assignee: Sai Hemanth Gantasala > Priority: Major > Labels: catalog-2024 > > This is a follow up from IMPALA-10926. The idea is that when any DDL > operation is performed from Impala shell, it also syncs the db/table to its > latest event ID as per HMS. This way updates to a db/table's are applied in > the same order as they appear in the Notification log in HMS which ensures > consistency. Currently catalogD applies any updates received from Impala > shell in place. Instead it should perform an HMS operation first and then > replay all the HMS events since the last synced event. > However there are subtle differences in how Impala processes DDLs via shell > vs how it processes HMS events These are: > * When processing an alter table event, currently catalogD does a full table > reload. This has a performance impact as table reload is time consuming. > Whereas in place alter table DDL operation in catalogOpExecutor (via Impala > shell) is faster since detects when to reload table schema or file metadata > or both. Need some improvements in Alter table event processing logic to > detect whether to reload the file metadata or not. --> This is addressed by > IMPALA-11534 > * Similar improvement is required in processing alter partition event. As of > now, when processing AlterPartition HMS event, catalogd always reloads file > metadata but when doing the same from shell, it reloads metadata only when it > is required. > * Impala shell already caches hive fns in catalog db’s object. But catalogD > does *not* process CREATE/DROP Fns HMS event > * When creating a db/table from Impala shell, if the operation fails because > the db/table already exists, then there is no reliable way in catalogd to > determine create event id for that db/table. The create event is required so > that for any subsequent ddl operations, catalogd can process HMS events > starting from createEvent Id. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org