Re: [DISCUSS] Separating out the metastore as its own TLP

Alan Gates Wed, 05 Jul 2017 11:07:24 -0700

On Mon, Jul 3, 2017 at 10:17 AM, Dain Sundstrom <d...@iq80.com> wrote:


> +1
>
> I work on Presto and I think this the right direction for our users.  We
> have several users running Presto without Hive and anything we can do to
> help simplify the Metastore experience would be a good help.
>
> When I read proposals like this, one thing I like to see is a vision
> (scope) for the project.  In this case, I’d like to understand if the plan
> is to limit the scope of the system to what Hive can support.  For example,
> the system will clearly support schemas (databases) with tables and views
> as defined by Hive, but will there be support for additional types like a
> Presto view which is incompatible with a Hive views due to the language
> differences?  Currently, in Presto we create a Hive view to reserve a spot
> in the "tables namespace”, and then we put our view data in a table
> properties.  I would like to formalize this kind of system, so if a Hive
> user queries a Presto view, they get a proper error message. I have similar
> concerns about data types, compression, and data organization (e.g.,
> different bucketing strategies).
>

We tried to lay out the scope in the wiki page [1] Details will need to be
worked out by the new project.  But I’ll give you my view on it.  I don’t
see the value of breaking this out of Hive if it isn’t willing to take
non-Hive features.  If it’s still Hive only in it’s focus why pay the cost
of having separate projects?  So, as long as Presto style views don’t break
Hive style views or make the system horribly complicated and someone is
willing to add them, +1.

A related area that we will need to work out is the metastore connection to
the Hive physical layout.  Today, when a user says “create table”, the
metastore creates a directory in HDFS.  This ties the metastore to a Hive
style data layout.  How should that be handled going forward?  We could
assert that having a standard data layout is good, and all users of this
metadata system should use this layout.  We could make the physical
operations pluggable, providing the Hive style operations as an option, but
allowing users to bring others. We could completely remove the physical
operations, leave them all in Hive, and say that any system using this
should do their own physical operations.  I don't like the last option
because it makes it hard to share data across tools, but I can think of pro
and con arguments for the first two.


> Another aspect of this is what is the vision for the specification of the
> Metastore.  Is the vision to have a very open end-user extensible design
> (e.g., just a name and a bag of properties), or is the vision to have a
> project specified common set properties with “rules” for proper extension?
>

Again, just my opinion, but I would say the latter.  The utility of a name
and a bag of properties turns out to be pretty limited and pretty easy to
implement if that’s all you want.  The current metastore can do a lot more
than that.


>
> I would also be very interested in documentation for the Metastore APIs
> (and can help). We currently reverse engineer proper metastore interaction
> by reading the Hive code, and writing a lot of experimental programs, and I
> would really just like to know the "right way”.  Also, we end up missing
> out on new features in the Metastore due to the work required to understand
> how they work.
>

+1 to better documentation regardless of where the metastore code lives.

Alan.


1. https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to