[DISCUSS] Separating out the metastore as its own TLP

Alan Gates Fri, 30 Jun 2017 07:57:07 -0700

A few of us have been talking and come to the conclussion that it would be
a good thing to split out the Hive metastore into its own Apache project.
Below and in the linked wiki page we explain what we see as the advantages
to this and how we would go about it.


Hive’s metastore has long been used by other projects in the Hadoop
ecosystem to store and access metadata.  Apache Impala, Apache Spark,
Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,
like Impala and Presto can use it as their own metadata system with the
rest of Hive not present.

This sharing is excellent for the ecosystem.  Together with HDFS it allows
users to use the tool of their choice while still accessing the same shared
data.  But having this shared metadata inside the Hive project limits the
ability of other projects to contribute to the metastore.  It also makes it
harder for new systems that have similar but not identical metadata
requirements (for example, stream processing systems on top of Apache
Kafka) to use Hive’s metastore.  This difficulty for other systems comes
out in two ways.  One, it is hard for non-Hive community members to
participate in the project.  Second, it adds operational cost since users
are forced to deploy all of the Hive jars just to get the metastore to work.

Therefore we propose to split Hive’s metastore out into a separate Apache
project.  This new project will continue to support the same Thrift API as
the current metastore.  It will continue to focus on being a high
performance, fault tolerant, large scale, operational metastore for SQL
engines and other systems that want to store schema information about their
data.

By making it a separate project we will enable other projects to join us in
innovating on the metastore.  It will simplify operations for non-Hive
users that want to use the metastore as they will no longer need to install
Hive just to get the metastore.  And it will attract new projects that
might otherwise feel the need to solve their metadata problems on their own.

Any Hive PMC member or committer will be welcome to join the new project at
the same level.  We propose this project go straight to a top level
project.  Given that the initial PMC will be formed from experienced Hive
PMC members we do not believe incubation will be necessary.  (Note that the
Apache board will need to approve this.)

Obviously there a many details involved in a proposal like this.  Rather
than make this a ten page email we have filled out many of the details in a
wiki page:
https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal

Yongzhi Chen
Vihang Karajgaonkar
Sergio Pena
Sahil Takiar
Aihua Xu
Gunther Hagleitner
Thejas Nair
Alan Gates

[DISCUSS] Separating out the metastore as its own TLP

Reply via email to