[ 
https://issues.apache.org/jira/browse/IMPALA-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755727#comment-17755727
 ] 

Wenzhe Zhou edited comment on IMPALA-12375 at 8/17/23 11:14 PM:
----------------------------------------------------------------

Data source objects are saved in-memory 
[cache|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Catalog.java#L93-L94]
 in Catalog.  They are [NOT persisted to the 
metastore|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Catalog.java#L261-L267]
 by original design.  All data source properties are stored as [table 
properties|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java#L41-L48]
 (persisted in the metastore) so that the DataSource catalog objects are not 
needed in order to scan data source tables.

Hive don't have data source object. All properties are specified as table 
properties when creating table with[JDBC 
storage|https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler].

Since data source objects are not persistent, they are missing when catalog 
server is restarted. If CatalogD HA is enabled, data source objects are created 
only on active catalogd, not on standby catalogd.  The missing data source 
objects don't affect existing data source tables. We need to recreate data 
source object before creating new data source table.

To make data source object persistent,  we need to add new APIs in HMS to 
support this Impala specific objects. 


was (Author: wzhou):
Data source objects are saved in-memory 
[cache|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Catalog.java#L93-L94]
 in Catalog.  They are NOT [persisted to the 
metastore|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/Catalog.java#L261-L267]
 by original design.  All data source properties are stored as [table 
properties|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java#L41-L48]
 (persisted in the metastore) so that the DataSource catalog objects are not 
needed in order to scan data source tables.

Hive don't have data source object. All properties are specified as table 
properties when creating table with[ JDBC 
storage|https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler].

Since data source objects are not persistent, they are missing when catalog 
server is restarted. If CatalogD HA is enabled, data source objects are created 
only on active catalogd, not on standby catalogd.  The missing data source 
objects don't affect existing data source tables. We need to recreate data 
source object before creating new data source table.

To make data source object persistent,  we need to add new APIs in HMS to 
support this Impala specific objects. 

> DataSource ojects are not persistent
> ------------------------------------
>
>                 Key: IMPALA-12375
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12375
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend, Catalog, Frontend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> DataSource ojects which are created with "CREATE DATA SOURCE" statements are 
> not persistent.  The objects are not shown in "show data sources" after the 
> mini-cluster is restarted.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to