zhangbutao commented on PR #6267:
URL: https://github.com/apache/hive/pull/6267#issuecomment-4110308633

   > Some ideas: 1, We have two catalog types, one is Native, the other is 
Non-Native; 2. For Native catalog, besides get table, the client can 
create/drop/alter tables/partitions, so the catalog location uri is a must, we 
can check it upon creating the catalog; 3. For Non-Native catalog, it's just a 
"mount" of external metadata sources, no changes allowed; 4. Native can be 
changed to a Non-Native catalog, the Non-Native can be to a Native catalog as 
well; 5. For default native hive catalog, the catalog location is 
`hive.metastore.warehouse.dir` 6. Every thing a catalog needs is stored in 
CATALOG_PARAMS, including the driver class, file system configurations, or 
extra jar files. The PARAM_VALUE should be a clob instead of varchar in 
CATALOG_PARAMS.
   > 
   > Given these above, we don't need extra `metastore.warehouse.catalog.dir` 
or others to determine the default catalog location, the location could be 
anywhere, just make sure it's accessible by the Metastore.
   
   
   
   1. Yes. Non-native catalogs may include JDBC catalog, another external HMS 
catalog, etc.
   
   2. The catalog location can be set with a default value, so that users do 
not need to specify the catalog location every time they create a native 
catalog, which improves the user experience.
   
   3. Non-native catalogs store data and metadata in external data sources, so 
they do not require a catalog location.
   
   4. Changing a native catalog to a non-native one, or vice versa, seems 
unnecessary and infeasible. For example, a JDBC catalog cannot be converted 
into a native catalog.
   
   5. Yes. The current default native catalog is named "hive", and its catalog 
location is determined by `hive.metastore.warehouse.dir`.
   
   6. Good catch! `CATALOG_PARAMS` is used to store various properties of 
catalogs. However, not all information needs to be persisted in the HMS. For a 
large amount of configuration information, such as Hadoop-related configs like 
`hdfs-site.xml` and `core-site.xml` required by a non-native catalog, a local 
or cloud directory can be specified for the non-native catalog to dynamically 
load these configurations. Of course, if the `varchar` column's storage 
capacity becomes insufficient in the future, we will continue to optimize it.
   
   
   > Given these above, we don't need extra `metastore.warehouse.catalog.dir` 
or others to determine the default catalog location. The location could be 
anywhere, just make sure it's accessible by the Metastore.
   
   I’d like to emphasize the purpose of catalog location:  
   Catalog location is designed for native catalogs. Currently, after creating 
a new native catalog, even if the catalog's location is set differently from 
the default native catalog's location (`hive.metastore.warehouse.dir`), the 
created Hive databases still reside under `hive.metastore.warehouse.dir`. That 
is, tables and databases under the newly created native catalog do not respect 
its own catalog location. Therefore, this PR aims to address two issues:
   
   1) Tables and databases under a newly created native catalog must respect 
its own catalog location and should not be mixed with the default native 
catalog (i.e., `hive.metastore.warehouse.dir`).
   
   2) When creating a native catalog, the location attribute should be allowed 
to be omitted. The default value can be controlled by the parameter 
`metastore.warehouse.catalog.dir`, making it more convenient for users to 
create native catalogs, similar to how omitting the location when creating a 
Hive database results in a default database location.
   
   
   @dengzhhu653 Please also refer to 
https://docs.google.com/document/d/1SX8OPd_KdBuynMr-D8AZ7V5tKkoUDKFterUSjeRpOW0/edit?tab=t.0
 for more info.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to