In HIVE-17983 I have been working on packaing and start/stop scripts for
the standalone metastore.  One question this brings up is how Hive will be
released now, with or without the metastore.  I can see two options:

1) We continue to ship the metastore with Hive.  Not only does this mean
the metastore code is in the Hive source code release and the metastore
jars are in the Hive binary distribution, but scripts like metastore.sh are
still included in Hive's bin directory, so that Hive admins can still do
'hive --service metastore' to start the metastore.  I see the following
advantages of this:
a) it is completely backwards compatible;
b) it is what users would expect (I have installed many databases and never
been asked to first install a separate package for its data catalog or any
other essential piece);
c) this will still be the metastore's most frequent use case for at least
the near future.

The disadvantage is it is error prone when Hive is set up to connect to a
separate metastore.  An operator could easily start the metastore in the
Hive package, not realizing Hive is configured to connect to a different
one.

2) We remove the metastore from the packaging completely like we do Hadoop
and require the user to install it separately.  The advantages and
disadvantages of this exactly mirror those of option 1.

Based on both the 80/20 rule (most metastore users will still be single
system Hive users) and the law of least astonishment (people expect a
database to have a data catalog) I vote for option 1.

Anyone strongly feel we should do 2 instead?

Any other options I haven't considered?

Alan.

Reply via email to