vinothchandar commented on a change in pull request #4718:
URL: https://github.com/apache/hudi/pull/4718#discussion_r824084864



##########
File path: rfc/rfc-36/rfc-36.md
##########
@@ -0,0 +1,605 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-36: Hudi Metastore Server
+
+## Proposers
+
+- @minihippo
+
+## Approvers
+
+
+## Status
+
+JIRA: [HUDI-3345](https://issues.apache.org/jira/browse/HUDI-3345)
+
+> Please keep the status updated in `rfc/README.md`.
+
+# Hudi Metastore Server
+
+## Abstract
+
+Currently, Hudi is widely used as a table format in the data warehouse. There 
is a lack of central metastore server to manage the metadata of data lake 
table. Hive metastore as a commonly used catalog service in the data warehouse 
on Hadoop cannot store the unique metadata like timeline of the hudi table.
+
+The proposal is to implement an unified metadata management system called hudi 
metastore server to store the metadata of the hudi table, and be compatible 
with hive metastore so that other engines can access it without any changes.
+
+## Backgroud
+
+**How Hudi metadata is stored**
+
+The metadata of hudi are table location, configuration and schema, timeline 
generated by instants, metadata of each commit / instant, which records files 
created / updated, new records num and so on in this commit. Besides, the 
information of files in a hudi table is also a part of hudi metadata.
+
+Different from instant or schema recorded by a separate file that is stored 
under `${tablelocation}/.hoodie` on the HDFS or object storage, files info are 
managed by the HDFS directly. Hudi gets all files of a table by file listing. 
File listing is a costly operation and its performance is limited by namenode. 
In addition, there will be a few invalid files on the file system, which are 
created by spark speculative tasks(for example) and are not deleted 
successfully. Getting files by listing will result in inconsistency, so hudi 
has to store the valid files from each commit metadata, the metadata about 
files is usually referred to snapshot.
+
+RFC-15 metadata table is a proposal that can solve these problems. However, it 
only manages the metadata of one table. There is a lack of a unified view.
+
+**The integration of Hive metastore and Hudi metadata lacks a single source of 
truth.**
+
+Hive metastore server is widely used as a metadata center in the data 
warehouse on Hadoop. It stores the metadata for hive tables like their schema, 
location and partitions. Currently, almost all of the storage or computing 
engines support registering table information to it, discovering and retrieving 
metadata from it. Meanwhile, cloud service providers like AWS Glue, HUAWEI 
Cloud, Google Cloud Dataproc, Alibaba Cloud, ByteDance Volcano Engine all 
provide Apache Hive metastore compatible catalog. It seems that hive metastore 
has become a standard in the data warehouse.
+
+Different from the traditional table format like hive table, the data lake 
table not only has schema, partitions and other hive metadata, but also has 
timeline, snapshot which is unconventional. Hence, the metadata of data lake 
cannot be managed by HMS directly.
+
+Hudi just syncs the schema and partitions to HMS by now, and other metadata 
still stores on HDFS or object store. Metadata synchronization between 
different metadata management systems will result in inconsistency.
+
+## Overview
+
+![architecture](architecture.png)
+
+The hudi metastore server is for metadata management of the data lake table, 
to support metadata persistency, efficient metadata access and other extensions 
for data lake. The metadata server managed includes the information of 
databases and tables, partitions, schemas, instants, instants' meta and files' 
meta.
+
+The metastore server has two main components: service and storage. The storage 
is for metadata persistency and the service is to receive the get / put 
requests from client and return / store the processing result after doing some 
logical operations on metadata.
+
+The hudi metastore server is / has
+
+- **A metastore server for data lake**
+    -  Different from the traditional table format, the metadata of the data 
lake has timeline and snapshot concepts, in addition to schema and partitions.
+
+    -  The metastore server is an unified metadata management system for data 
lake table.
+
+- **Pluggable storage**
+    -  The storage is only responsible for metadata presistency. Therefore, 
it's doesn't matter what the storage engine is used to store the data, it can 
be a RDBMS, kv system or file system.
+
+- **Easy to be expanded**
+    -  The service is stateless, so it can be scaled horizontally to support 
higher QPS. The storage can be split vertically to store more data.
+
+- **Compatible with multiple computing engines**
+    -  The server has an adapter to be compatible with hive metastore server.
+
+## Design
+
+This part has four sections: what the service does, what and how the metadata 
stores, how the service interacts with the storage when reading and writing a 
hudi table with an example, and the extension of the metastore server which 
will be implemented in the next version.
+
+### Service
+
+Service is to receive requests from clients and return results according to 
the metadata in the storage combined with some processing logic. According to 
the functional division, the service consists of four parts:
+
+- **table service**
+    -  is for table related requests. To client, it exposes API about database 
and table CRUD.
+
+- **partition service**
+    -  is for partition related requests. To client, it exposes API about CRUD:
+
+    - support multiple ways of reading, like checking the partition's 
existence, getting partition info, getting partitions which satisfy a specific 
condition(partition pruning).
+    - creating or updating API cannot be invoked directly,  only a new commit 
completion can trigger it.
+    -  dropping a partition not only deletes the partition and files at 
metadata level, but also triggers a clean action to do the physical clean that 
deletes the data on the file system.
+
+- **timeline service**

Review comment:
       Agree with Option 2. Its more scalable. let's discuss details during 
implementation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to