[
https://issues.apache.org/jira/browse/HIVE-26227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906583#comment-17906583
]
Butao Zhang edited comment on HIVE-26227 at 12/18/24 2:26 AM:
--------------------------------------------------------------
BTW, based on above comments by [~wechar] :
{code:java}
Currently we want to manage the metadata from Hive, Hbase, Kafka, Jdbc, etc,
and computing engines like Hive, Spark, Presto, Flink can join data from
different systems based on the metadata in Hive metastore.{code}
This type federated query involves storing or mapping metadata from an external
data source into the HMS catalog, and then the compute engine(Spark/Flink, etc)
federates queries from multiple HMS catalogs which store different metadata of
data sources.
One advantage of this type is that all types of metadata(jdbc/hive/hbase, etc)
can be authenticated and audited on the HMS server side, so HMS can act as a
{color:#de350b}unified metadata management center{color:#172b4d}. (Note: other
engines may need to do some customized development to query the metadata stored
on HMS's multi catalogs.){color}{color}
But, this also depend on HMS to retrieve metadata of multi
datasource(jdbc/kafka, etc), what we want to do is to decouple datasource from
HMS. That is to say, if we want to query jdbc datasouce, we no need the HMS(or
no need HMS's catalog to map the jdbc's metadata), just like trino.
was (Author: zhangbutao):
BTW, based on above comments by [~wechar] :
{code:java}
Currently we want to manage the metadata from Hive, Hbase, Kafka, Jdbc, etc,
and computing engines like Hive, Spark, Presto, Flink can join data from
different systems based on the metadata in Hive metastore.{code}
This type federated query involves storing or mapping metadata from an external
data source into the HMS catalog, and then the compute engine(Spark/Flink, etc)
federates queries from multiple HMS catalogs which store different metadata of
data sources.
But, this also depend on HMS to retrieve metadata of multi
datasource(jdbc/kafka, etc), what we want to do is to decouple datasource from
HMS. That is to say, if we want to query jdbc datasouce, we no need the HMS(or
no need HMS's catalog to map the jdbc's metadata), just like trino.
> Add support of catalog related statements for Hive ql
> -----------------------------------------------------
>
> Key: HIVE-26227
> URL: https://issues.apache.org/jira/browse/HIVE-26227
> Project: Hive
> Issue Type: Task
> Components: Hive
> Reporter: Wechar
> Assignee: Wechar
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Catalog concept is proposed to Hive 3.0 to allow different systems to connect
> to different catalogs in the metastore. But so far we can not query catalog
> through Hive ql, this task aims to implement the ddl statements related to
> catalog.
> *Create Catalog*
> {code:sql}
> CREATE CATALOG [IF NOT EXISTS] catalog_name
> LOCATION hdfs_path
> [COMMENT catalog_comment];
> {code}
> LOCATION is required for creating a new catalog now.
> *Alter Catalog*
> {code:sql}
> ALTER CATALOG catalog_name SET LOCATION hdfs_path;
> {code}
> Only location metadata can be altered for catalog.
> *Drop Catalog*
> {code:sql}
> DROP CATALOG [IF EXISTS] catalog_name;
> {code}
> DROP CATALOG is always RESTRICT, which means DROP CATALOG will fail if there
> are non-default databases in the catalog.
> *Show Catalogs*
> {code:sql}
> SHOW CATALOGS [LIKE 'identifier_with_wildcards'];
> {code}
> SHOW CATALOGS lists all of the catalogs defined in the metastore.
> The optional LIKE clause allows the list of catalogs to be filtered using a
> regular expression.
> *Describe Catalog*
> {code:sql}
> DESC[RIBE] CATALOG [EXTENDED] cat_name;
> {code}
> DESCRIBE CATALOG shows the name of the catalog, its comment (if one has been
> set), and its root location on the filesystem.
> EXTENDED also shows the create time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)