In HIVE-17983 I have been working on packaing and start/stop scripts for the standalone metastore. One question this brings up is how Hive will be released now, with or without the metastore. I can see two options:
1) We continue to ship the metastore with Hive. Not only does this mean the metastore code is in the Hive source code release and the metastore jars are in the Hive binary distribution, but scripts like metastore.sh are still included in Hive's bin directory, so that Hive admins can still do 'hive --service metastore' to start the metastore. I see the following advantages of this: a) it is completely backwards compatible; b) it is what users would expect (I have installed many databases and never been asked to first install a separate package for its data catalog or any other essential piece); c) this will still be the metastore's most frequent use case for at least the near future. The disadvantage is it is error prone when Hive is set up to connect to a separate metastore. An operator could easily start the metastore in the Hive package, not realizing Hive is configured to connect to a different one. 2) We remove the metastore from the packaging completely like we do Hadoop and require the user to install it separately. The advantages and disadvantages of this exactly mirror those of option 1. Based on both the 80/20 rule (most metastore users will still be single system Hive users) and the law of least astonishment (people expect a database to have a data catalog) I vote for option 1. Anyone strongly feel we should do 2 instead? Any other options I haven't considered? Alan.