On Sun, Jul 2, 2017 at 10:15 PM, Alan Gates <alanfga...@gmail.com> wrote:

> Comments inlined.
>
> On Sun, Jul 2, 2017 at 3:22 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> > I am not sure I am on the fence with this.
> >
> > I am -1, and I offer this -1 with the hope of being convinced otherwise
> >
> Thank you for being open to reconsider.
>
> >
> >
> > "By making it a separate project we will enable other projects to join us
> > in
> > innovating on the metastore. "
> >
> > The relevant questions I have are,
> >
> > "What is stopping others from joining us now?"
> > "What does being a TLP do for us that we do not have now?"
> >
>
> Walking through a use case will help answer these.  This is a real world
> situation, not a hypothetical.  I’ve been talking with a team building a
> schema registry for Kafka[1].  I’d like them to use the Hive metastore
> rather than reinvent the wheel.  I believe this would be good for users
> (all their tools can work together on a shared understanding of the data)
> and admins (just one metadata store to administer) and for the ecosytems
> (tools can work across stored data and streaming data).
>
> This system has some requirements on metadata that Hive does not.  To take
> one example, it would like a schema to be a top level concept instead of a
> concept tied to tables or partitions.  This is not a problem for Hive, but
> neither is it interesting.  So if they come with patches for this, would we
> accept them?  As the Hive PMC our answer will be no, because it doesn’t
> help Hive’s metadata.  Even if we accept their patches will we make them
> committers when we know they don’t care about Hive as Hive, but only the
> metastore.  Again, the right answer for the Hive PMC is no.
>
> And we cannot say that Hive should support a generic metadata system within
> itself.  That turns Hive into an umbrella project, which Apache has
> repeatedly worked to avoid.  So Hive will either need to reject non-Hive
> centric features and contributors or end up in a place Apache has worked to
> avoid.
>
> And finally, why would other teams want to mess with all of Hive when they
> only want the metastore?  Hive is a large and complex system.  If we break
> the metastore out it is much more approachable by non-Hive contributors.
>
> Obviously the Hive team doesn’t want to see their metastore turn into
> something unusable by Hive, which is why we were specific in saying we
> wanted it to continue to support high performance SQL systems.
>
> My experience in watching ORC move out of Hive is that the adoption has
> increased significantly.  It is reasonable to assume that moving the
> metastore out will also increase adoption and make it easier for others to
> get involved.
>
>
> > I see a lot of downsides:
> > 1) We have to maintain two sites
> > 2) we have to maintain two committer lists
> >
> > A large problem I see is this: Hive is already being pulled in too many
> > different directions. There is some grumbling about the state of
> > hive-on-spark.
> >
>
> I believe this argues in favor of the split, not against.  By pulling out
> the metastore we are releiving pressure on Hive itself.  Let Hive focus on
> being a SQL engine.  Let another team focus on runtime metadata.
>
> On your committer questions in later emails, the point of going to a TLP
> has nothing to do with adding new committers.   Traditionally new projects
> start in the incubator.  But given that all of the PMC of this new project
> are already experienced Hive PMC members I see no reason to go through
> incubator.  I agree with you that we would not throw any new people into
> the mix.  People join the project in the same way as always, by
> contributing.
>
> Alan.
>
> 1. https://github.com/hortonworks/registry
>
>
> > Most importantly, our release process seems 'injured' by too many
> branches
> > going off in different ways. If the metastore lives outside of Hive we
> are
> > going to compound this issue. I would strongly suggest we do not
> undertake
> > this until we can at least turn out 2 usable releases in a 6 month
> period.
> >
>

So if they come with patches for this, would we
accept them?  As the Hive PMC our answer will be no, because it doesn’t
help Hive’s metadata.

We already have things in the meta-store not directly tied to language
features. For example hive metastore has a "retention" property which is
not actively in use by anything. In reality, we rarely say 'no' or -1 to
much. Which in part is why I believe our release process is grinding
slower: we have so many things in flight I do not feel that any one person
can keep track. You are working on porting the metastore to hbase.
https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or 'No'
along the way? When I first noticed this I pointed out that someone has
already ported the metastore to Cassandra
https://github.com/riptano/brisk/blob/master/src/java/src/org/apache/cassandra/hadoop/hive/metastore/SchemaManagerService.java,
but I was more exciting/rational for this multi-year approach using hbase
so I let everyone 'have at it'.

I am going to give a hypothetical but real world situation. Suppose I want
to add the statement "CREATE permanent macro xyz", this feature I believe
would cross cut calcite, hive, and hive metastore. To build this feature I
would need to orchestrate the change across 3 separate groups of hive
'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
releases. That is not counting if we run into some bug or misfeature (maybe
with Tez or something else) so that brings in 4-5 releases of upstream to
add a feature to hive. This does not take into account normal processes
mess ups. For example say you get the metastore done, but now the people
doing the calcite/antlr suggest the feature have different syntax because
they did not read the 3-4 linked tickets when the process started? Now, you
have to loop back around the process. Finding 1 person in 1 project to
usher along the feature you want is difficult, having to find and clear
time with 3 people across three projects is going to be a difficult along
with then 'pushing' them all to kick out a release so you can finally use
said feature.

Reply via email to