A few of us have been talking and come to the conclussion that it would be a good thing to split out the Hive metastore into its own Apache project. Below and in the linked wiki page we explain what we see as the advantages to this and how we would go about it.
Hive’s metastore has long been used by other projects in the Hadoop ecosystem to store and access metadata. Apache Impala, Apache Spark, Apache Drill, Presto, and other systems all use Hive’s metastore. Some, like Impala and Presto can use it as their own metadata system with the rest of Hive not present. This sharing is excellent for the ecosystem. Together with HDFS it allows users to use the tool of their choice while still accessing the same shared data. But having this shared metadata inside the Hive project limits the ability of other projects to contribute to the metastore. It also makes it harder for new systems that have similar but not identical metadata requirements (for example, stream processing systems on top of Apache Kafka) to use Hive’s metastore. This difficulty for other systems comes out in two ways. One, it is hard for non-Hive community members to participate in the project. Second, it adds operational cost since users are forced to deploy all of the Hive jars just to get the metastore to work. Therefore we propose to split Hive’s metastore out into a separate Apache project. This new project will continue to support the same Thrift API as the current metastore. It will continue to focus on being a high performance, fault tolerant, large scale, operational metastore for SQL engines and other systems that want to store schema information about their data. By making it a separate project we will enable other projects to join us in innovating on the metastore. It will simplify operations for non-Hive users that want to use the metastore as they will no longer need to install Hive just to get the metastore. And it will attract new projects that might otherwise feel the need to solve their metadata problems on their own. Any Hive PMC member or committer will be welcome to join the new project at the same level. We propose this project go straight to a top level project. Given that the initial PMC will be formed from experienced Hive PMC members we do not believe incubation will be necessary. (Note that the Apache board will need to approve this.) Obviously there a many details involved in a proposal like this. Rather than make this a ten page email we have filled out many of the details in a wiki page: https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal Yongzhi Chen Vihang Karajgaonkar Sergio Pena Sahil Takiar Aihua Xu Gunther Hagleitner Thejas Nair Alan Gates