Re: [DISCUSS] Separating out the metastore as its own TLP

Dimitris Tsirogiannis Fri, 30 Jun 2017 10:00:32 -0700


On 2017-06-30 07:56 (-0700), Alan Gates <[email protected]> wrote: 
> A few of us have been talking and come to the conclussion that it would be
> a good thing to split out the Hive metastore into its own Apache project.
> Below and in the linked wiki page we explain what we see as the advantages
> to this and how we would go about it.
> 
> Hiveâs metastore has long been used by other projects in the Hadoop
> ecosystem to store and access metadata.  Apache Impala, Apache Spark,
> Apache Drill, Presto, and other systems all use Hiveâs metastore.  Some,
> like Impala and Presto can use it as their own metadata system with the
> rest of Hive not present.
> 
> This sharing is excellent for the ecosystem.  Together with HDFS it allows
> users to use the tool of their choice while still accessing the same shared
> data.  But having this shared metadata inside the Hive project limits the
> ability of other projects to contribute to the metastore.  It also makes it
> harder for new systems that have similar but not identical metadata
> requirements (for example, stream processing systems on top of Apache
> Kafka) to use Hiveâs metastore.  This difficulty for other systems comes
> out in two ways.  One, it is hard for non-Hive community members to
> participate in the project.  Second, it adds operational cost since users
> are forced to deploy all of the Hive jars just to get the metastore to work.
> 
> Therefore we propose to split Hiveâs metastore out into a separate Apache
> project.  This new project will continue to support the same Thrift API as
> the current metastore.  It will continue to focus on being a high
> performance, fault tolerant, large scale, operational metastore for SQL
> engines and other systems that want to store schema information about their
> data.
> 
> By making it a separate project we will enable other projects to join us in
> innovating on the metastore.  It will simplify operations for non-Hive
> users that want to use the metastore as they will no longer need to install
> Hive just to get the metastore.  And it will attract new projects that
> might otherwise feel the need to solve their metadata problems on their own.
> 
> Any Hive PMC member or committer will be welcome to join the new project at
> the same level.  We propose this project go straight to a top level
> project.  Given that the initial PMC will be formed from experienced Hive
> PMC members we do not believe incubation will be necessary.  (Note that the
> Apache board will need to approve this.)
> 
> Obviously there a many details involved in a proposal like this.  Rather
> than make this a ten page email we have filled out many of the details in a
> wiki page:
> https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal
> 
> Yongzhi Chen
> Vihang Karajgaonkar
> Sergio Pena
> Sahil Takiar
> Aihua Xu
> Gunther Hagleitner
> Thejas Nair
> Alan Gates
>


+1 (from Apache Impala's (incubating) perspective)

Dimitris

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to