I think Edward's concern is valid. While I voiced my support for this
proposal, which was more from the benefits of the whole Hadoop ecosystem, I
don't see the equal benefits for Hive. Instead, it may even create more
overhead for Hive. I'd really like to take time to see what are the road
blocks for other projects to use HMS as it is. The issue of Spark including
a Hive fork, which was brought up some time back, is certainly not one of
them.

Thanks,
Xuefu

On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates <alanfga...@gmail.com> wrote:
>
> > On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo <edlinuxg...@gmail.com>
> > wrote:
> >
> > >
> > > We already have things in the meta-store not directly tied to language
> > > features. For example hive metastore has a "retention" property which
> is
> > > not actively in use by anything. In reality, we rarely say 'no' or -1
> to
> > > much. Which in part is why I believe our release process is grinding
> > > slower: we have so many things in flight I do not feel that any one
> > person
> > > can keep track. You are working on porting the metastore to hbase.
> > > https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
> 'No'
> > > along the way? When I first noticed this I pointed out that someone has
> > > already ported the metastore to Cassandra
> > > https://github.com/riptano/brisk/blob/master/src/java/
> > > src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> > Service.java,
> > > but I was more exciting/rational for this multi-year approach using
> hbase
> > > so I let everyone 'have at it'.
> > >
> > Your example and mine are not equivalent.  The HBase metastore is still a
> > Hive feature, even if some thought it not worth while.  That is different
> > than people bringing features that will never interest Hive or that Hive
> > could never use (e.g. Dain’s desire for the metastore to support Presto
> > style views).
> >
> > I forgot to mention the issue these would be non-Hive contributors have
> > with releases if they contribute their features to the metastore while
> it’s
> > inside Hive.  Is Hive going to do a release just to push out features in
> > the metastore that it doesn’t care about?
> >
> > You seem to be asserting that doing this doesn’t really help non-Hive
> based
> > systems that are using or would like to use the metastore.  But it is
> > interesting that people from three of those systems have commented in the
> > thread so far, and all are positive (Dmitrias from Impala, Dain from
> > Presto, and Sriharsha from the schema registry project).
> >
> >
> > > I am going to give a hypothetical but real world situation. Suppose I
> > want
> > > to add the statement "CREATE permanent macro xyz", this feature I
> believe
> > > would cross cut calcite, hive, and hive metastore. To build this
> feature
> > I
> > > would need to orchestrate the change across 3 separate groups of hive
> > > 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> > > releases. That is not counting if we run into some bug or misfeature
> > (maybe
> > > with Tez or something else) so that brings in 4-5 releases of upstream
> to
> > > add a feature to hive. This does not take into account normal processes
> > > mess ups. For example say you get the metastore done, but now the
> people
> > > doing the calcite/antlr suggest the feature have different syntax
> because
> > > they did not read the 3-4 linked tickets when the process started? Now,
> > you
> > > have to loop back around the process. Finding 1 person in 1 project to
> > > usher along the feature you want is difficult, having to find and clear
> > > time with 3 people across three projects is going to be a difficult
> along
> > > with then 'pushing' them all to kick out a release so you can finally
> use
> > > said feature.
> > >
> >
> > I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
> > adds much, if any, overhead.  Hive is a big project and no one person
> knows
> > all the code anymore.  If you wanted to add a permanent macros feature
> you
> > would need reviews from someone who knows the parser (probably
> Pengcheng),
> > people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
> > the metastore (me, Thejas, …).  And any large feature is going to be
> > implemented over multiple JIRAs, all of which are linkable regardless of
> > whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
> > makes the feature disagreement any worse.  If the optimizer team
> absolutely
> > insists it has to have some feature and the metastore team insists that
> it
> > can’t have that feature you’re going to have to work through the issue
> > whether they all are in Hive or in two separate projects.
> >
> > Where I agree the split adds cost is releases.  Before your macro feature
> > could go live you need releases from each of the components.  And while
> in
> > development the components need to use snapshot versions of the other
> > components.  My assertion is that the benefits out weigh this cost.
> >
> > Alan.
> >
>
>
> "You seem to be asserting that doing this doesn’t really help non-Hive
> based
> systems that are using or would like to use the metastore.  But it is
> interesting that people from three of those systems have commented in the
> thread so far, and all are positive (Dmitrias from Impala, Dain from
> Presto, and Sriharsha from the schema registry project)."
>
> I notice that impala has a syntax for caching.
>
> https://www.cloudera.com/documentation/enterprise/5-8-x/topi
> cs/impala_perf_hdfs_caching.html
>
> Notice how the cache syntax did not way into Hive? It would make sense if
> this feature trickled it's way into hive and use HDFS caching for example.
> I have heard many people claim that using hive metastore is such a because
> it is packaged weird (like with ORC), but again besides claim/complaining
> no one has stepped up to deal with that.
>
> What I would suggest is going forward for maybe a trial period of 6 months,
> labeling JIRA tickets with a tag that would be
> "SeeThisProvesWeNeedATLPMetastore". Because right now I do not enough
> active use cases of people giving anything back to justify hurting our
> workflow so much.
>
>
> "I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
> adds much, if any, overhead.  Hive is a big project and no one person knows
> all the code anymore.  If you wanted to add a permanent macros feature you
> would need reviews from someone who knows the parser (probably Pengcheng),
> people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
> the metastore (me, Thejas, …).  And any large feature is going to be
> implemented over multiple JIRAs, all of which are linkable regardless of
> whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
> makes the feature disagreement any worse.  If the optimizer team absolutely
> insists it has to have some feature and the metastore team insists that it
> can’t have that feature you’re going to have to work through the issue
> whether they all are in Hive or in two separate projects"
>
> Macro was done in 1 patch and reviewed by 2 people. With 2-3 follow on
> bugs.
>
> https://issues.apache.org/jira/browse/HIVE-2655
>
> I think your perception is different then mine because of circumstances. I
> have waited weeks/months for reviews/merges (in Hive and other apache
> projects) from mundane udfs to cassandra-storage-handlers. You obviously
> work in a large company and you can more easily align objectives, go to the
> water cooler and say "hey bob you know it would be cool if you can release
> x so I can do y". When you are not in that situation its like, "hey mailing
> list, my patch was done for three months now and like I have had to rebase
> it three times and like I notice like other stuff is getting committed."
>
> If you look at it tactically, "create permanent macro xzy". I go over to
> calcite and suggest some changes there, if this concept is not "game
> changer" it is probably going to sit unreviewed. If it is "game changer"
> exciting that is 72 hours for release voting. Next go to hive-metastore
> repeat the process, but remember now I have to "wow" the metastore people
> with the "game changer" and if that crew is super focused on something
> about kafka well now Hive features are second fiddle. Now lets say a hive
> release is coming up, and I really want my feature in it.
>  hive-metastore-tlp might currently have a broken trunk because mongo wants
> to add spaceships to wombats feature has a bug and frankly that should not
> effect us.
>
> I hate to draw in something else but I feel it is related:
>
> 8 December 2016 : release 2.1.1 available
> 07 April 2017 : release 1.2.2 available
> hive-dev [DISCUSS] Supporting Hadoop-1 and experimental features
> hive-dev Re: release chaos?
>
> I have been vocal about not liking certain branching strategies and
> proposals that take us away from releasable trunk. We have steadily headed
> in a direction where we are pulling things out of hive, and we are not able
> to turn out releases. We even had a thread "release chaos" talking about
> our 5 active branches (with friends I say "jumped the shark"). Pulling out
> the metastore is only going to make this worse. I do not even see the model
> as successful. You may say it is great that calcite lets people share our
> sql dialect or the ORC TLP has 5 committers, but if Hive can not get a
> release out the door I do not see us optimizing for the proper thing.
>

Reply via email to