zhangbutao commented on PR #6267: URL: https://github.com/apache/hive/pull/6267#issuecomment-4110308633
> Some ideas: 1, We have two catalog types, one is Native, the other is Non-Native; 2. For Native catalog, besides get table, the client can create/drop/alter tables/partitions, so the catalog location uri is a must, we can check it upon creating the catalog; 3. For Non-Native catalog, it's just a "mount" of external metadata sources, no changes allowed; 4. Native can be changed to a Non-Native catalog, the Non-Native can be to a Native catalog as well; 5. For default native hive catalog, the catalog location is `hive.metastore.warehouse.dir` 6. Every thing a catalog needs is stored in CATALOG_PARAMS, including the driver class, file system configurations, or extra jar files. The PARAM_VALUE should be a clob instead of varchar in CATALOG_PARAMS. > > Given these above, we don't need extra `metastore.warehouse.catalog.dir` or others to determine the default catalog location, the location could be anywhere, just make sure it's accessible by the Metastore. 1. Yes. Non-native catalogs may include JDBC catalog, another external HMS catalog, etc. 2. The catalog location can be set with a default value, so that users do not need to specify the catalog location every time they create a native catalog, which improves the user experience. 3. Non-native catalogs store data and metadata in external data sources, so they do not require a catalog location. 4. Changing a native catalog to a non-native one, or vice versa, seems unnecessary and infeasible. For example, a JDBC catalog cannot be converted into a native catalog. 5. Yes. The current default native catalog is named "hive", and its catalog location is determined by `hive.metastore.warehouse.dir`. 6. Good catch! `CATALOG_PARAMS` is used to store various properties of catalogs. However, not all information needs to be persisted in the HMS. For a large amount of configuration information, such as Hadoop-related configs like `hdfs-site.xml` and `core-site.xml` required by a non-native catalog, a local or cloud directory can be specified for the non-native catalog to dynamically load these configurations. Of course, if the `varchar` column's storage capacity becomes insufficient in the future, we will continue to optimize it. > Given these above, we don't need extra `metastore.warehouse.catalog.dir` or others to determine the default catalog location. The location could be anywhere, just make sure it's accessible by the Metastore. I’d like to emphasize the purpose of catalog location: Catalog location is designed for native catalogs. Currently, after creating a new native catalog, even if the catalog's location is set differently from the default native catalog's location (`hive.metastore.warehouse.dir`), the created Hive databases still reside under `hive.metastore.warehouse.dir`. That is, tables and databases under the newly created native catalog do not respect its own catalog location. Therefore, this PR aims to address two issues: 1) Tables and databases under a newly created native catalog must respect its own catalog location and should not be mixed with the default native catalog (i.e., `hive.metastore.warehouse.dir`). 2) When creating a native catalog, the location attribute should be allowed to be omitted. The default value can be controlled by the parameter `metastore.warehouse.catalog.dir`, making it more convenient for users to create native catalogs, similar to how omitting the location when creating a Hive database results in a default database location. @dengzhhu653 Please also refer to https://docs.google.com/document/d/1SX8OPd_KdBuynMr-D8AZ7V5tKkoUDKFterUSjeRpOW0/edit?tab=t.0 for more info. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
