Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-28 Thread Gopal Vijayaraghavan
On 7/25/17, 4:45 PM, "cwsteinb...@gmail.com on behalf of Carl Steinbach" 
 wrote:

>"IceWeasel" and "MetaStore" are both examples of English compound words.
>What exactly makes the former any safer than the latter?

Usually descriptive words are considered weaker for trademarks - if the words 
describe what it does, then it might be weaker.

"PainKiller" is a weak one, while "Aspirin" isn't. 

Uniqueness is useful, because an active defense is necessary to retain 
possession of a trademark - as a tautology, the more unique the phrase, the 
fewer occurrences there are to tackle.
 
But, in the case of Aspirin, Bayer did not defend the use lowercase "aspirin" 
and now only has a TM on the upper-case one "Aspirin".

IceWeasel is an infamous precedent of trademark dispute in the open source 
community

"The end of the Iceweasel Age" - https://lwn.net/Articles/676799/
+
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=354622

Cheers,
Gopal




Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-27 Thread Lefty Leverenz
Johndee (and everyone else), wiki edit privileges are easy to get:  About
This Wiki -- How to get permission to edit

.

-- Lefty


On Thu, Jul 27, 2017 at 2:32 PM, Johndee Cloudera 
wrote:

> Well if I cannot get Metastore, how about Hadoop Metastore it is simple and
> self explanatory to a degree.
>
> @Alan,
>
> Sorry to make the name suggestion here but I could not comment or edit the
> page you created.
>
> On Mon, Jul 24, 2017 at 7:17 PM, Gopal Vijayaraghavan 
> wrote:
>
> > Hi,
> >
> >
> > Changing the name isn't really optional or "being google-able" [2].
> >
> > The naming is a crucial part of trademark protection [1], which is the
> > only protection ASF has against hostile Embrace & Extends.
> >
> > Fragmented forks with the same name is particularly bad, especially if
> the
> > feature in question can be only used by a proprietary tool (like Dain's
> > suggestion about Presto view metadata, except it only works with a
> per-cpu
> > license).
> >
> > The safe path isn't pretty, it still ends up with IcedTea and IceWeasel …
> > but at least, those are clearly weird.
> >
> > Cheers,
> > Gopal
> >
> > [1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
> > [2] - https://packages.debian.org/jessie/misc/metastore
> >
> > On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:
> >
> > +1 to Vihang's suggestion. Changing the name will only cause
> confusion.
> >
> > On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera <
> > john...@cloudera.com>
> > wrote:
> >
> > > +1 Vihang, I do not really like Catalog as it could create
> confusion
> > with
> > > the Catalog daemon from impala.
> > >
> > > On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar <
> > vih...@cloudera.com>
> > > wrote:
> > >
> > > > Before we see a flood of name suggestions :) Why not just keep it
> > > > Metastore? Its already well-known in the community and easy to
> > relate to.
> > > >
> > > > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates <
> alanfga...@gmail.com>
> > > wrote:
> > > >
> > > > > In the same vein Carter and Gunther suggested Omegastore.  Pick
> > your
> > > > > alphabet and whether it’s a catalog or a store I guess.
> > > > >
> > > > > Alan.
> > > > >
> > > > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > > > ser...@hortonworks.com>
> > > > > wrote:
> > > > >
> > > > > > I’d like to suggest ZCatalog.
> > > > > >
> > > > > > On 17/7/11, 15:41, "Lefty Leverenz"  >
> > wrote:
> > > > > >
> > > > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > > > >
> > > > > > >> How about "Flora"?  (Andrew Sherman)
> > > > > > >
> > > > > > >Nice idea and thanks for introducing me to that book,
> Andrew.
> > > > > > >
> > > > > > >Along the same lines, how about "Honeycomb"?
> > > > > > >
> > > > > > >But since the idea is to make the metastore useful for many
> > > projects,
> > > > a
> > > > > > >generic name that starts with "Meta" would be less confusing
> > ...
> > > even
> > > > > > >though it breaks the tradition of Apache projects having
> > quirky
> > > names.
> > > > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has
> > other
> > > > > > >connotations, but it's cool.
> > > > > > >
> > > > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of
> > splitting
> > > > off
> > > > > > >the metastore into a new project:  -0.5 for the sake of Hive
> > and
> > > +0.5
> > > > > for
> > > > > > >the greater good.  Wishy-washy, that's me.
> > > > > > >
> > > > > > >-- Lefty
> > > > > > >
> > > > > > >
> > > > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > > > asher...@cloudera.com>
> > > > > > >wrote:
> > > > > > >
> > > > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > > > owen.omal...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <
> > sunc...@apache.org>
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > and maybe a different project name?
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> > Yes, it certainly needs a new name. I'd like to suggest
> > Riven.
> > > > > > >> >
> > > > > > >> > .. Owen
> > > > > > >> >
> > > > > > >>
> > > > > > >> How about "Flora"?
> > > > > > >>
> > > > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > > > >>
> > > > > > >> -Andrew
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > - JRB
> > >
> >
> >
> >
> >
>
>
> --
> - JRB
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-27 Thread Johndee Cloudera
Well if I cannot get Metastore, how about Hadoop Metastore it is simple and
self explanatory to a degree.

@Alan,

Sorry to make the name suggestion here but I could not comment or edit the
page you created.

On Mon, Jul 24, 2017 at 7:17 PM, Gopal Vijayaraghavan 
wrote:

> Hi,
>
>
> Changing the name isn't really optional or "being google-able" [2].
>
> The naming is a crucial part of trademark protection [1], which is the
> only protection ASF has against hostile Embrace & Extends.
>
> Fragmented forks with the same name is particularly bad, especially if the
> feature in question can be only used by a proprietary tool (like Dain's
> suggestion about Presto view metadata, except it only works with a per-cpu
> license).
>
> The safe path isn't pretty, it still ends up with IcedTea and IceWeasel …
> but at least, those are clearly weird.
>
> Cheers,
> Gopal
>
> [1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
> [2] - https://packages.debian.org/jessie/misc/metastore
>
> On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:
>
> +1 to Vihang's suggestion. Changing the name will only cause confusion.
>
> On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera <
> john...@cloudera.com>
> wrote:
>
> > +1 Vihang, I do not really like Catalog as it could create confusion
> with
> > the Catalog daemon from impala.
> >
> > On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >
> > > Before we see a flood of name suggestions :) Why not just keep it
> > > Metastore? Its already well-known in the community and easy to
> relate to.
> > >
> > > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates 
> > wrote:
> > >
> > > > In the same vein Carter and Gunther suggested Omegastore.  Pick
> your
> > > > alphabet and whether it’s a catalog or a store I guess.
> > > >
> > > > Alan.
> > > >
> > > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > > ser...@hortonworks.com>
> > > > wrote:
> > > >
> > > > > I’d like to suggest ZCatalog.
> > > > >
> > > > > On 17/7/11, 15:41, "Lefty Leverenz" 
> wrote:
> > > > >
> > > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > > >
> > > > > >> How about "Flora"?  (Andrew Sherman)
> > > > > >
> > > > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > > > >
> > > > > >Along the same lines, how about "Honeycomb"?
> > > > > >
> > > > > >But since the idea is to make the metastore useful for many
> > projects,
> > > a
> > > > > >generic name that starts with "Meta" would be less confusing
> ...
> > even
> > > > > >though it breaks the tradition of Apache projects having
> quirky
> > names.
> > > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has
> other
> > > > > >connotations, but it's cool.
> > > > > >
> > > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of
> splitting
> > > off
> > > > > >the metastore into a new project:  -0.5 for the sake of Hive
> and
> > +0.5
> > > > for
> > > > > >the greater good.  Wishy-washy, that's me.
> > > > > >
> > > > > >-- Lefty
> > > > > >
> > > > > >
> > > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > > asher...@cloudera.com>
> > > > > >wrote:
> > > > > >
> > > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > > owen.omal...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <
> sunc...@apache.org>
> > > > wrote:
> > > > > >> >
> > > > > >> > > and maybe a different project name?
> > > > > >> > >
> > > > > >> >
> > > > > >> > Yes, it certainly needs a new name. I'd like to suggest
> Riven.
> > > > > >> >
> > > > > >> > .. Owen
> > > > > >> >
> > > > > >>
> > > > > >> How about "Flora"?
> > > > > >>
> > > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > > >>
> > > > > >> -Andrew
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > - JRB
> >
>
>
>
>


-- 
- JRB


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-27 Thread Alan Gates
I think the concerns with Metastore are twofold.  One, it’s a common term
in software, and thus would be hard to defend as a trademark.  Hence
Gopal’s link to the Debian package already named metastore.  IANAL but as I
understand trademark law the test is whether a name could cause a
reasonable person to be confused as to who is offering a good or service.
So a McDonalds tire store is ok, a reasonable person knows McDonalds sells
food, not tires, but a restaurant named McDonnies that serves hamburgers
isn’t.  Whether IceWeasal (or any of the suggested names besides metastore)
passes that test I don’t know.  The board will require us to go through a
namesearch as part of the TLP process.

If I understand Gopal’s second point it is that it will cause confusion for
users as to whether this is still part of Hive or something separate.  I
think stressing the continuity is exactly what Vihang and Carl like about
keeping the name.

My suggestion would be that the project needs a name other than metastore,
but we can call the module metastore.  Having a more unique name is good
for trademarks and helping users find your stuff via google, etc.  (Go to
Google and search on “Hive" to see what I mean here.)  If we pick X as the
project name, we can then refer to it as the X metastore, the maven modules
can be x-metastore, etc.  I think this strikes the balance between the
benefits of a unique-ish name and telling users what it does and where it
came from.

Plus, IceWeasal is a _way_ cooler name for a project that metastore. :)

Alan.

On Tue, Jul 25, 2017 at 4:45 PM, Carl Steinbach  wrote:

> "IceWeasel" and "MetaStore" are both examples of English compound words.
> What exactly makes the former any safer than the latter?
>
> On Mon, Jul 24, 2017 at 4:17 PM, Gopal Vijayaraghavan 
> wrote:
>
> > Hi,
> >
> >
> > Changing the name isn't really optional or "being google-able" [2].
> >
> > The naming is a crucial part of trademark protection [1], which is the
> > only protection ASF has against hostile Embrace & Extends.
> >
> > Fragmented forks with the same name is particularly bad, especially if
> the
> > feature in question can be only used by a proprietary tool (like Dain's
> > suggestion about Presto view metadata, except it only works with a
> per-cpu
> > license).
> >
> > The safe path isn't pretty, it still ends up with IcedTea and IceWeasel …
> > but at least, those are clearly weird.
> >
> > Cheers,
> > Gopal
> >
> > [1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
> > [2] - https://packages.debian.org/jessie/misc/metastore
> >
> > On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:
> >
> > +1 to Vihang's suggestion. Changing the name will only cause
> confusion.
> >
> > On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera <
> > john...@cloudera.com>
> > wrote:
> >
> > > +1 Vihang, I do not really like Catalog as it could create
> confusion
> > with
> > > the Catalog daemon from impala.
> > >
> > > On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar <
> > vih...@cloudera.com>
> > > wrote:
> > >
> > > > Before we see a flood of name suggestions :) Why not just keep it
> > > > Metastore? Its already well-known in the community and easy to
> > relate to.
> > > >
> > > > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates <
> alanfga...@gmail.com>
> > > wrote:
> > > >
> > > > > In the same vein Carter and Gunther suggested Omegastore.  Pick
> > your
> > > > > alphabet and whether it’s a catalog or a store I guess.
> > > > >
> > > > > Alan.
> > > > >
> > > > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > > > ser...@hortonworks.com>
> > > > > wrote:
> > > > >
> > > > > > I’d like to suggest ZCatalog.
> > > > > >
> > > > > > On 17/7/11, 15:41, "Lefty Leverenz"  >
> > wrote:
> > > > > >
> > > > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > > > >
> > > > > > >> How about "Flora"?  (Andrew Sherman)
> > > > > > >
> > > > > > >Nice idea and thanks for introducing me to that book,
> Andrew.
> > > > > > >
> > > > > > >Along the same lines, how about "Honeycomb"?
> > > > > > >
> > > > > > >But since the idea is to make the metastore useful for many
> > > projects,
> > > > a
> > > > > > >generic name that starts with "Meta" would be less confusing
> > ...
> > > even
> > > > > > >though it breaks the tradition of Apache projects having
> > quirky
> > > names.
> > > > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has
> > other
> > > > > > >connotations, but it's cool.
> > > > > > >
> > > > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of
> > splitting
> > > > off
> > > > > > >the metastore into a new project:  -0.5 for the sake of Hive
> > and
> > > +0.5
> > > > > for
> > > > > > 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-25 Thread Carl Steinbach
"IceWeasel" and "MetaStore" are both examples of English compound words.
What exactly makes the former any safer than the latter?

On Mon, Jul 24, 2017 at 4:17 PM, Gopal Vijayaraghavan 
wrote:

> Hi,
>
>
> Changing the name isn't really optional or "being google-able" [2].
>
> The naming is a crucial part of trademark protection [1], which is the
> only protection ASF has against hostile Embrace & Extends.
>
> Fragmented forks with the same name is particularly bad, especially if the
> feature in question can be only used by a proprietary tool (like Dain's
> suggestion about Presto view metadata, except it only works with a per-cpu
> license).
>
> The safe path isn't pretty, it still ends up with IcedTea and IceWeasel …
> but at least, those are clearly weird.
>
> Cheers,
> Gopal
>
> [1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
> [2] - https://packages.debian.org/jessie/misc/metastore
>
> On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:
>
> +1 to Vihang's suggestion. Changing the name will only cause confusion.
>
> On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera <
> john...@cloudera.com>
> wrote:
>
> > +1 Vihang, I do not really like Catalog as it could create confusion
> with
> > the Catalog daemon from impala.
> >
> > On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >
> > > Before we see a flood of name suggestions :) Why not just keep it
> > > Metastore? Its already well-known in the community and easy to
> relate to.
> > >
> > > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates 
> > wrote:
> > >
> > > > In the same vein Carter and Gunther suggested Omegastore.  Pick
> your
> > > > alphabet and whether it’s a catalog or a store I guess.
> > > >
> > > > Alan.
> > > >
> > > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > > ser...@hortonworks.com>
> > > > wrote:
> > > >
> > > > > I’d like to suggest ZCatalog.
> > > > >
> > > > > On 17/7/11, 15:41, "Lefty Leverenz" 
> wrote:
> > > > >
> > > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > > >
> > > > > >> How about "Flora"?  (Andrew Sherman)
> > > > > >
> > > > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > > > >
> > > > > >Along the same lines, how about "Honeycomb"?
> > > > > >
> > > > > >But since the idea is to make the metastore useful for many
> > projects,
> > > a
> > > > > >generic name that starts with "Meta" would be less confusing
> ...
> > even
> > > > > >though it breaks the tradition of Apache projects having
> quirky
> > names.
> > > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has
> other
> > > > > >connotations, but it's cool.
> > > > > >
> > > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of
> splitting
> > > off
> > > > > >the metastore into a new project:  -0.5 for the sake of Hive
> and
> > +0.5
> > > > for
> > > > > >the greater good.  Wishy-washy, that's me.
> > > > > >
> > > > > >-- Lefty
> > > > > >
> > > > > >
> > > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > > asher...@cloudera.com>
> > > > > >wrote:
> > > > > >
> > > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > > owen.omal...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <
> sunc...@apache.org>
> > > > wrote:
> > > > > >> >
> > > > > >> > > and maybe a different project name?
> > > > > >> > >
> > > > > >> >
> > > > > >> > Yes, it certainly needs a new name. I'd like to suggest
> Riven.
> > > > > >> >
> > > > > >> > .. Owen
> > > > > >> >
> > > > > >>
> > > > > >> How about "Flora"?
> > > > > >>
> > > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > > >>
> > > > > >> -Andrew
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > - JRB
> >
>
>
>
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-25 Thread Alan Gates
I’ve collected the naming suggestions on the wiki at
https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal
 Please feel free to add additional ones there.

Alan.

On Mon, Jul 24, 2017 at 4:17 PM, Gopal Vijayaraghavan 
wrote:

> Hi,
>
>
> Changing the name isn't really optional or "being google-able" [2].
>
> The naming is a crucial part of trademark protection [1], which is the
> only protection ASF has against hostile Embrace & Extends.
>
> Fragmented forks with the same name is particularly bad, especially if the
> feature in question can be only used by a proprietary tool (like Dain's
> suggestion about Presto view metadata, except it only works with a per-cpu
> license).
>
> The safe path isn't pretty, it still ends up with IcedTea and IceWeasel …
> but at least, those are clearly weird.
>
> Cheers,
> Gopal
>
> [1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
> [2] - https://packages.debian.org/jessie/misc/metastore
>
> On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:
>
> +1 to Vihang's suggestion. Changing the name will only cause confusion.
>
> On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera <
> john...@cloudera.com>
> wrote:
>
> > +1 Vihang, I do not really like Catalog as it could create confusion
> with
> > the Catalog daemon from impala.
> >
> > On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar <
> vih...@cloudera.com>
> > wrote:
> >
> > > Before we see a flood of name suggestions :) Why not just keep it
> > > Metastore? Its already well-known in the community and easy to
> relate to.
> > >
> > > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates 
> > wrote:
> > >
> > > > In the same vein Carter and Gunther suggested Omegastore.  Pick
> your
> > > > alphabet and whether it’s a catalog or a store I guess.
> > > >
> > > > Alan.
> > > >
> > > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > > ser...@hortonworks.com>
> > > > wrote:
> > > >
> > > > > I’d like to suggest ZCatalog.
> > > > >
> > > > > On 17/7/11, 15:41, "Lefty Leverenz" 
> wrote:
> > > > >
> > > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > > >
> > > > > >> How about "Flora"?  (Andrew Sherman)
> > > > > >
> > > > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > > > >
> > > > > >Along the same lines, how about "Honeycomb"?
> > > > > >
> > > > > >But since the idea is to make the metastore useful for many
> > projects,
> > > a
> > > > > >generic name that starts with "Meta" would be less confusing
> ...
> > even
> > > > > >though it breaks the tradition of Apache projects having
> quirky
> > names.
> > > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has
> other
> > > > > >connotations, but it's cool.
> > > > > >
> > > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of
> splitting
> > > off
> > > > > >the metastore into a new project:  -0.5 for the sake of Hive
> and
> > +0.5
> > > > for
> > > > > >the greater good.  Wishy-washy, that's me.
> > > > > >
> > > > > >-- Lefty
> > > > > >
> > > > > >
> > > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > > asher...@cloudera.com>
> > > > > >wrote:
> > > > > >
> > > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > > owen.omal...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun <
> sunc...@apache.org>
> > > > wrote:
> > > > > >> >
> > > > > >> > > and maybe a different project name?
> > > > > >> > >
> > > > > >> >
> > > > > >> > Yes, it certainly needs a new name. I'd like to suggest
> Riven.
> > > > > >> >
> > > > > >> > .. Owen
> > > > > >> >
> > > > > >>
> > > > > >> How about "Flora"?
> > > > > >>
> > > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > > >>
> > > > > >> -Andrew
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > - JRB
> >
>
>
>
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Gopal Vijayaraghavan
Hi,


Changing the name isn't really optional or "being google-able" [2].

The naming is a crucial part of trademark protection [1], which is the only 
protection ASF has against hostile Embrace & Extends.

Fragmented forks with the same name is particularly bad, especially if the 
feature in question can be only used by a proprietary tool (like Dain's 
suggestion about Presto view metadata, except it only works with a per-cpu 
license).

The safe path isn't pretty, it still ends up with IcedTea and IceWeasel … but 
at least, those are clearly weird.

Cheers,
Gopal

[1] - https://en.wikipedia.org/wiki/A_moron_in_a_hurry#United_States
[2] - https://packages.debian.org/jessie/misc/metastore

On 7/24/17, 3:04 PM, "Carl Steinbach"  wrote:

+1 to Vihang's suggestion. Changing the name will only cause confusion.

On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera 
wrote:

> +1 Vihang, I do not really like Catalog as it could create confusion with
> the Catalog daemon from impala.
>
> On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar 
> wrote:
>
> > Before we see a flood of name suggestions :) Why not just keep it
> > Metastore? Its already well-known in the community and easy to relate 
to.
> >
> > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates 
> wrote:
> >
> > > In the same vein Carter and Gunther suggested Omegastore.  Pick your
> > > alphabet and whether it’s a catalog or a store I guess.
> > >
> > > Alan.
> > >
> > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > ser...@hortonworks.com>
> > > wrote:
> > >
> > > > I’d like to suggest ZCatalog.
> > > >
> > > > On 17/7/11, 15:41, "Lefty Leverenz"  wrote:
> > > >
> > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > >
> > > > >> How about "Flora"?  (Andrew Sherman)
> > > > >
> > > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > > >
> > > > >Along the same lines, how about "Honeycomb"?
> > > > >
> > > > >But since the idea is to make the metastore useful for many
> projects,
> > a
> > > > >generic name that starts with "Meta" would be less confusing ...
> even
> > > > >though it breaks the tradition of Apache projects having quirky
> names.
> > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has other
> > > > >connotations, but it's cool.
> > > > >
> > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of 
splitting
> > off
> > > > >the metastore into a new project:  -0.5 for the sake of Hive and
> +0.5
> > > for
> > > > >the greater good.  Wishy-washy, that's me.
> > > > >
> > > > >-- Lefty
> > > > >
> > > > >
> > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > asher...@cloudera.com>
> > > > >wrote:
> > > > >
> > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > owen.omal...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun 
> > > wrote:
> > > > >> >
> > > > >> > > and maybe a different project name?
> > > > >> > >
> > > > >> >
> > > > >> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > > > >> >
> > > > >> > .. Owen
> > > > >> >
> > > > >>
> > > > >> How about "Flora"?
> > > > >>
> > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > >>
> > > > >> -Andrew
> > > > >>
> > > >
> > > >
> > >
> >
>
>
>
> --
> - JRB
>





Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Carl Steinbach
+1 to Vihang's suggestion. Changing the name will only cause confusion.

On Mon, Jul 24, 2017 at 2:28 PM, Johndee Cloudera 
wrote:

> +1 Vihang, I do not really like Catalog as it could create confusion with
> the Catalog daemon from impala.
>
> On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar 
> wrote:
>
> > Before we see a flood of name suggestions :) Why not just keep it
> > Metastore? Its already well-known in the community and easy to relate to.
> >
> > On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates 
> wrote:
> >
> > > In the same vein Carter and Gunther suggested Omegastore.  Pick your
> > > alphabet and whether it’s a catalog or a store I guess.
> > >
> > > Alan.
> > >
> > > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> > ser...@hortonworks.com>
> > > wrote:
> > >
> > > > I’d like to suggest ZCatalog.
> > > >
> > > > On 17/7/11, 15:41, "Lefty Leverenz"  wrote:
> > > >
> > > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > > >
> > > > >> How about "Flora"?  (Andrew Sherman)
> > > > >
> > > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > > >
> > > > >Along the same lines, how about "Honeycomb"?
> > > > >
> > > > >But since the idea is to make the metastore useful for many
> projects,
> > a
> > > > >generic name that starts with "Meta" would be less confusing ...
> even
> > > > >though it breaks the tradition of Apache projects having quirky
> names.
> > > > >Unfortunately "Metalog" is already in use.  "Metamorph" has other
> > > > >connotations, but it's cool.
> > > > >
> > > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting
> > off
> > > > >the metastore into a new project:  -0.5 for the sake of Hive and
> +0.5
> > > for
> > > > >the greater good.  Wishy-washy, that's me.
> > > > >
> > > > >-- Lefty
> > > > >
> > > > >
> > > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> > asher...@cloudera.com>
> > > > >wrote:
> > > > >
> > > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > > owen.omal...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun 
> > > wrote:
> > > > >> >
> > > > >> > > and maybe a different project name?
> > > > >> > >
> > > > >> >
> > > > >> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > > > >> >
> > > > >> > .. Owen
> > > > >> >
> > > > >>
> > > > >> How about "Flora"?
> > > > >>
> > > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > > >>
> > > > >> -Andrew
> > > > >>
> > > >
> > > >
> > >
> >
>
>
>
> --
> - JRB
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Johndee Cloudera
+1 Vihang, I do not really like Catalog as it could create confusion with
the Catalog daemon from impala.

On Mon, Jul 24, 2017 at 5:20 PM, Vihang Karajgaonkar 
wrote:

> Before we see a flood of name suggestions :) Why not just keep it
> Metastore? Its already well-known in the community and easy to relate to.
>
> On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates  wrote:
>
> > In the same vein Carter and Gunther suggested Omegastore.  Pick your
> > alphabet and whether it’s a catalog or a store I guess.
> >
> > Alan.
> >
> > On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin <
> ser...@hortonworks.com>
> > wrote:
> >
> > > I’d like to suggest ZCatalog.
> > >
> > > On 17/7/11, 15:41, "Lefty Leverenz"  wrote:
> > >
> > > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > > >
> > > >> How about "Flora"?  (Andrew Sherman)
> > > >
> > > >Nice idea and thanks for introducing me to that book, Andrew.
> > > >
> > > >Along the same lines, how about "Honeycomb"?
> > > >
> > > >But since the idea is to make the metastore useful for many projects,
> a
> > > >generic name that starts with "Meta" would be less confusing ... even
> > > >though it breaks the tradition of Apache projects having quirky names.
> > > >Unfortunately "Metalog" is already in use.  "Metamorph" has other
> > > >connotations, but it's cool.
> > > >
> > > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting
> off
> > > >the metastore into a new project:  -0.5 for the sake of Hive and +0.5
> > for
> > > >the greater good.  Wishy-washy, that's me.
> > > >
> > > >-- Lefty
> > > >
> > > >
> > > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman <
> asher...@cloudera.com>
> > > >wrote:
> > > >
> > > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> > owen.omal...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun 
> > wrote:
> > > >> >
> > > >> > > and maybe a different project name?
> > > >> > >
> > > >> >
> > > >> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > > >> >
> > > >> > .. Owen
> > > >> >
> > > >>
> > > >> How about "Flora"?
> > > >>
> > > >> (Flora is the protagonist of The Bees by Laline Paull)
> > > >>
> > > >> -Andrew
> > > >>
> > >
> > >
> >
>



-- 
- JRB


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Vihang Karajgaonkar
Before we see a flood of name suggestions :) Why not just keep it
Metastore? Its already well-known in the community and easy to relate to.

On Mon, Jul 24, 2017 at 2:13 PM, Alan Gates  wrote:

> In the same vein Carter and Gunther suggested Omegastore.  Pick your
> alphabet and whether it’s a catalog or a store I guess.
>
> Alan.
>
> On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin 
> wrote:
>
> > I’d like to suggest ZCatalog.
> >
> > On 17/7/11, 15:41, "Lefty Leverenz"  wrote:
> >
> > >>> I'd like to suggest Riven.  (Owen O'Malley)
> > >
> > >> How about "Flora"?  (Andrew Sherman)
> > >
> > >Nice idea and thanks for introducing me to that book, Andrew.
> > >
> > >Along the same lines, how about "Honeycomb"?
> > >
> > >But since the idea is to make the metastore useful for many projects, a
> > >generic name that starts with "Meta" would be less confusing ... even
> > >though it breaks the tradition of Apache projects having quirky names.
> > >Unfortunately "Metalog" is already in use.  "Metamorph" has other
> > >connotations, but it's cool.
> > >
> > >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
> > >the metastore into a new project:  -0.5 for the sake of Hive and +0.5
> for
> > >the greater good.  Wishy-washy, that's me.
> > >
> > >-- Lefty
> > >
> > >
> > >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
> > >wrote:
> > >
> > >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley <
> owen.omal...@gmail.com>
> > >> wrote:
> > >>
> > >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun 
> wrote:
> > >> >
> > >> > > and maybe a different project name?
> > >> > >
> > >> >
> > >> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > >> >
> > >> > .. Owen
> > >> >
> > >>
> > >> How about "Flora"?
> > >>
> > >> (Flora is the protagonist of The Bees by Laline Paull)
> > >>
> > >> -Andrew
> > >>
> >
> >
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Alan Gates
In the same vein Carter and Gunther suggested Omegastore.  Pick your
alphabet and whether it’s a catalog or a store I guess.

Alan.

On Mon, Jul 24, 2017 at 1:35 PM, Sergey Shelukhin 
wrote:

> I’d like to suggest ZCatalog.
>
> On 17/7/11, 15:41, "Lefty Leverenz"  wrote:
>
> >>> I'd like to suggest Riven.  (Owen O'Malley)
> >
> >> How about "Flora"?  (Andrew Sherman)
> >
> >Nice idea and thanks for introducing me to that book, Andrew.
> >
> >Along the same lines, how about "Honeycomb"?
> >
> >But since the idea is to make the metastore useful for many projects, a
> >generic name that starts with "Meta" would be less confusing ... even
> >though it breaks the tradition of Apache projects having quirky names.
> >Unfortunately "Metalog" is already in use.  "Metamorph" has other
> >connotations, but it's cool.
> >
> >Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
> >the metastore into a new project:  -0.5 for the sake of Hive and +0.5 for
> >the greater good.  Wishy-washy, that's me.
> >
> >-- Lefty
> >
> >
> >On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
> >wrote:
> >
> >> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
> >> wrote:
> >>
> >> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> >> >
> >> > > and maybe a different project name?
> >> > >
> >> >
> >> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> >> >
> >> > .. Owen
> >> >
> >>
> >> How about "Flora"?
> >>
> >> (Flora is the protagonist of The Bees by Laline Paull)
> >>
> >> -Andrew
> >>
>
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-24 Thread Sergey Shelukhin
I’d like to suggest ZCatalog.

On 17/7/11, 15:41, "Lefty Leverenz"  wrote:

>>> I'd like to suggest Riven.  (Owen O'Malley)
>
>> How about "Flora"?  (Andrew Sherman)
>
>Nice idea and thanks for introducing me to that book, Andrew.
>
>Along the same lines, how about "Honeycomb"?
>
>But since the idea is to make the metastore useful for many projects, a
>generic name that starts with "Meta" would be less confusing ... even
>though it breaks the tradition of Apache projects having quirky names.
>Unfortunately "Metalog" is already in use.  "Metamorph" has other
>connotations, but it's cool.
>
>Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
>the metastore into a new project:  -0.5 for the sake of Hive and +0.5 for
>the greater good.  Wishy-washy, that's me.
>
>-- Lefty
>
>
>On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
>wrote:
>
>> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
>> wrote:
>>
>> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>> >
>> > > and maybe a different project name?
>> > >
>> >
>> > Yes, it certainly needs a new name. I'd like to suggest Riven.
>> >
>> > .. Owen
>> >
>>
>> How about "Flora"?
>>
>> (Flora is the protagonist of The Bees by Laline Paull)
>>
>> -Andrew
>>



Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-21 Thread Alan Gates
It seems we have settled into a consensus that this will be good for the
ecosystem, but there are concerns that this will be a burden on Hive.  The
original proposal included a time to work on the separation inside the Hive
project to help address any such issues.  This would culminate in a source
only release of the metastore inside Hive.  I propose we start working on
that internal separation now.  I’ll file an umbrella JIRA for it soon.

Alan.

On Tue, Jul 11, 2017 at 3:41 PM, Lefty Leverenz 
wrote:

> >> I'd like to suggest Riven.  (Owen O'Malley)
>
> > How about "Flora"?  (Andrew Sherman)
>
> Nice idea and thanks for introducing me to that book, Andrew.
>
> Along the same lines, how about "Honeycomb"?
>
> But since the idea is to make the metastore useful for many projects, a
> generic name that starts with "Meta" would be less confusing ... even
> though it breaks the tradition of Apache projects having quirky names.
> Unfortunately "Metalog" is already in use.  "Metamorph" has other
> connotations, but it's cool.
>
> Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
> the metastore into a new project:  -0.5 for the sake of Hive and +0.5 for
> the greater good.  Wishy-washy, that's me.
>
> -- Lefty
>
>
> On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
> wrote:
>
> > On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
> > wrote:
> >
> > > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> > >
> > > > and maybe a different project name?
> > > >
> > >
> > > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > >
> > > .. Owen
> > >
> >
> > How about "Flora"?
> >
> > (Flora is the protagonist of The Bees by Laline Paull)
> >
> > -Andrew
> >
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-11 Thread Lefty Leverenz
>> I'd like to suggest Riven.  (Owen O'Malley)

> How about "Flora"?  (Andrew Sherman)

Nice idea and thanks for introducing me to that book, Andrew.

Along the same lines, how about "Honeycomb"?

But since the idea is to make the metastore useful for many projects, a
generic name that starts with "Meta" would be less confusing ... even
though it breaks the tradition of Apache projects having quirky names.
Unfortunately "Metalog" is already in use.  "Metamorph" has other
connotations, but it's cool.

Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
the metastore into a new project:  -0.5 for the sake of Hive and +0.5 for
the greater good.  Wishy-washy, that's me.

-- Lefty


On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
wrote:

> On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
> wrote:
>
> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> >
> > > and maybe a different project name?
> > >
> >
> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> >
> > .. Owen
> >
>
> How about "Flora"?
>
> (Flora is the protagonist of The Bees by Laline Paull)
>
> -Andrew
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-11 Thread Andrew Sherman
On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
wrote:

> On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>
> > and maybe a different project name?
> >
>
> Yes, it certainly needs a new name. I'd like to suggest Riven.
>
> .. Owen
>

How about "Flora"?

(Flora is the protagonist of The Bees by Laline Paull)

-Andrew


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-10 Thread Alan Gates
On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
wrote:

>
> I hate to draw in something else but I feel it is related:
>
> 8 December 2016 : release 2.1.1 available
> 07 April 2017 : release 1.2.2 available
> hive-dev [DISCUSS] Supporting Hadoop-1 and experimental features
> hive-dev Re: release chaos?
>
> I have been vocal about not liking certain branching strategies and
> proposals that take us away from releasable trunk. We have steadily headed
> in a direction where we are pulling things out of hive, and we are not able
> to turn out releases. We even had a thread "release chaos" talking about
> our 5 active branches (with friends I say "jumped the shark"). Pulling out
> the metastore is only going to make this worse. I do not even see the model
> as successful. You may say it is great that calcite lets people share our
> sql dialect or the ORC TLP has 5 committers, but if Hive can not get a
> release out the door I do not see us optimizing for the proper thing.
>

I don’t see the relationship between these issues.  I agree Hive is not
releasing frequently enough.  I agree that breaking out the metastore won’t
fix that.  It isn’t intended to fix that.  But as stated elsewhere in this
thread there are ways to make sure it won’t make it any worse either.

Alan.


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-10 Thread Alan Gates
The proposal does build in a time delay by doing the separation first in
the Hive project.  This will take several months.  We should not put an
absolute bound on it (6 months or whatever).  The reasoning behind doing
the separation in Hive before moving the code out is to make sure it’s
feasible and to better understand the issues the separation will create.

Alan.

On Wed, Jul 5, 2017 at 1:16 PM, Xuefu Zhang  wrote:

> I think Edward's concern is valid. While I voiced my support for this
> proposal, which was more from the benefits of the whole Hadoop ecosystem, I
> don't see the equal benefits for Hive. Instead, it may even create more
> overhead for Hive. I'd really like to take time to see what are the road
> blocks for other projects to use HMS as it is. The issue of Spark including
> a Hive fork, which was brought up some time back, is certainly not one of
> them.
>
> Thanks,
> Xuefu
>
> On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
> wrote:
>
> > On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates  wrote:
> >
> > > On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo  >
> > > wrote:
> > >
> > > >
> > > > We already have things in the meta-store not directly tied to
> language
> > > > features. For example hive metastore has a "retention" property which
> > is
> > > > not actively in use by anything. In reality, we rarely say 'no' or -1
> > to
> > > > much. Which in part is why I believe our release process is grinding
> > > > slower: we have so many things in flight I do not feel that any one
> > > person
> > > > can keep track. You are working on porting the metastore to hbase.
> > > > https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
> > 'No'
> > > > along the way? When I first noticed this I pointed out that someone
> has
> > > > already ported the metastore to Cassandra
> > > > https://github.com/riptano/brisk/blob/master/src/java/
> > > > src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> > > Service.java,
> > > > but I was more exciting/rational for this multi-year approach using
> > hbase
> > > > so I let everyone 'have at it'.
> > > >
> > > Your example and mine are not equivalent.  The HBase metastore is
> still a
> > > Hive feature, even if some thought it not worth while.  That is
> different
> > > than people bringing features that will never interest Hive or that
> Hive
> > > could never use (e.g. Dain’s desire for the metastore to support Presto
> > > style views).
> > >
> > > I forgot to mention the issue these would be non-Hive contributors have
> > > with releases if they contribute their features to the metastore while
> > it’s
> > > inside Hive.  Is Hive going to do a release just to push out features
> in
> > > the metastore that it doesn’t care about?
> > >
> > > You seem to be asserting that doing this doesn’t really help non-Hive
> > based
> > > systems that are using or would like to use the metastore.  But it is
> > > interesting that people from three of those systems have commented in
> the
> > > thread so far, and all are positive (Dmitrias from Impala, Dain from
> > > Presto, and Sriharsha from the schema registry project).
> > >
> > >
> > > > I am going to give a hypothetical but real world situation. Suppose I
> > > want
> > > > to add the statement "CREATE permanent macro xyz", this feature I
> > believe
> > > > would cross cut calcite, hive, and hive metastore. To build this
> > feature
> > > I
> > > > would need to orchestrate the change across 3 separate groups of hive
> > > > 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> > > > releases. That is not counting if we run into some bug or misfeature
> > > (maybe
> > > > with Tez or something else) so that brings in 4-5 releases of
> upstream
> > to
> > > > add a feature to hive. This does not take into account normal
> processes
> > > > mess ups. For example say you get the metastore done, but now the
> > people
> > > > doing the calcite/antlr suggest the feature have different syntax
> > because
> > > > they did not read the 3-4 linked tickets when the process started?
> Now,
> > > you
> > > > have to loop back around the process. Finding 1 person in 1 project
> to
> > > > usher along the feature you want is difficult, having to find and
> clear
> > > > time with 3 people across three projects is going to be a difficult
> > along
> > > > with then 'pushing' them all to kick out a release so you can finally
> > use
> > > > said feature.
> > > >
> > >
> > > I partially agree with you.  On the reviews, JIRAs, etc. I don’t think
> it
> > > adds much, if any, overhead.  Hive is a big project and no one person
> > knows
> > > all the code anymore.  If you wanted to add a permanent macros feature
> > you
> > > would need reviews from someone who knows the parser (probably
> > Pengcheng),
> > > people who know the optimizer (Jesus, Ashutosh, …), and someone who
> knows
> > > the metastore (me, 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-10 Thread Alan Gates
+1 to having an always releasable head of master.

+1 to having test verified API compliance.  I was thinking that the project
should set up verification tests where it runs against supported versions
of Hive, Imapala, Spark, … (and obviously open to others to add their tests
as well) on a nightly basis so that we guarantee API stability.

Alan.

On Thu, Jul 6, 2017 at 2:10 AM, Peter Vary  wrote:

> Hi folks,
>
> I agree with most of the things Edward said. I have faced similar issues
> in smaller scale when integrated Hive with Yetus. We are forced to keep
> patched Yetus files in Hive repo until they push their next release. Also
> followed one more serious problem when a patch was committed to Hive,
> Impala and Spark, and just few days before the release all of them was
> reverted from the projects due to concerns raised by the Spark committee
> (after the changes was already committed to Spark as well)
>
> Having said all of these, I still think that separating the HMS to a new
> top level project could be a step to the right direction with the following
> constraints. The new project should have:
> - Strict, stability oriented branching strategy following Edward's
> suggestions, so if a downstream project - for example Hive - needs some fix
> or easy change that could be incorporated, and released almost immediately.
> So we have to have these:
> - Always releasable head
> - Every multi commit feature should be added as a feature branch
> - Strict, enforced, stability oriented API strategy. So we will not be
> surprised by features added by other projects and break Hive compatibility.
> To avoid this situation we need to design for it, have pre-commit tests in
> place for catch the in-adverted changes, and most importantly have a clear
> commitment for it.
>
> I think, since the current HMS is already used by numerous other projects,
> we already should have these in mind when modifying anything in HMS related
> code. This is not the main focus of Hive, so we do not concentrate on this
> and there are often interoperability issues, problems. We can do this
> inside Hive as well, but the current approach followed by Hive, and the one
> required by the HMS are requiring a different mindset. We need a clear,
> well defined boundary and separating the 2 projects could help in this. We
> can focus on the different needs and goal and eventually we might have
> different culture as well which suits the specific needs of the specific
> part of the code.
>
> I think keeping these rules in the new to level HMS we can mitigate most
> of the issues mentioned below, and we will be better of overall.
> What do you think Edward?
>
> Thanks,
> Peter
>
>
> > On Jul 5, 2017, at 10:16 PM, Xuefu Zhang  wrote:
> >
> > I think Edward's concern is valid. While I voiced my support for this
> > proposal, which was more from the benefits of the whole Hadoop
> ecosystem, I
> > don't see the equal benefits for Hive. Instead, it may even create more
> > overhead for Hive. I'd really like to take time to see what are the road
> > blocks for other projects to use HMS as it is. The issue of Spark
> including
> > a Hive fork, which was brought up some time back, is certainly not one of
> > them.
> >
> > Thanks,
> > Xuefu
> >
> > On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
> > wrote:
> >
> >> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates 
> wrote:
> >>
> >>> On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo  >
> >>> wrote:
> >>>
> 
>  We already have things in the meta-store not directly tied to language
>  features. For example hive metastore has a "retention" property which
> >> is
>  not actively in use by anything. In reality, we rarely say 'no' or -1
> >> to
>  much. Which in part is why I believe our release process is grinding
>  slower: we have so many things in flight I do not feel that any one
> >>> person
>  can keep track. You are working on porting the metastore to hbase.
>  https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
> >> 'No'
>  along the way? When I first noticed this I pointed out that someone
> has
>  already ported the metastore to Cassandra
>  https://github.com/riptano/brisk/blob/master/src/java/
>  src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> >>> Service.java,
>  but I was more exciting/rational for this multi-year approach using
> >> hbase
>  so I let everyone 'have at it'.
> 
> >>> Your example and mine are not equivalent.  The HBase metastore is
> still a
> >>> Hive feature, even if some thought it not worth while.  That is
> different
> >>> than people bringing features that will never interest Hive or that
> Hive
> >>> could never use (e.g. Dain’s desire for the metastore to support Presto
> >>> style views).
> >>>
> >>> I forgot to mention the issue these would be non-Hive 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-06 Thread Vihang Karajgaonkar
I can understand the concerns from Edward and Xuefu and I think they are
valid as well. I think having a regular cadence of release will help
alleviate the concerns related to features making into releases to a
certain extent. Having quarterly or semi-annual releases would be a good
thing in general for both Hive as well as Metastore (if we decide to
separate it). It would help that metastore has the same PMC as Hive since
you would most likely have to get reviews and approvals from the same set
of people that you would now. For features in dev branches, we will have to
come up with a release strategy so that features spanning metastore and
hive would work well while in development. May be using snapshot libraries
of metastore which builds from the latest code and making a metastore
release before releasing Hive so that they are always in sync.

As far as concerns related to features spanning multiple projects are
concerned, it is true even today (Hive, Spark, Tez are all separate
projects), although I agree that it may be a lesser problem for most
features. In my opinion, while it is true that many other projects like
Impala, Presto, Spark use Hive Metastore, they do so primarily to maintain
compatibility with Hive. When their adoption rises what stops them from
creating their own metadata service which suits their needs better? If or
when that happens, it would lead to fragmentation as far as metadata stores
are concerned. If we separate HMS, we can strive to make it a general
purpose metadata service which the other projects would like to adopt
banking on the advantages which it brings now like compatibility with Hive.
I think as long as metastore is within Hive, it will always be Hive's
metastore and other projects will be cautious to adopt it fearing changes
which might break their code and always having to play well with Hive.

Thanks,
Vihang

On Thu, Jul 6, 2017 at 2:10 AM, Peter Vary  wrote:

> Hi folks,
>
> I agree with most of the things Edward said. I have faced similar issues
> in smaller scale when integrated Hive with Yetus. We are forced to keep
> patched Yetus files in Hive repo until they push their next release. Also
> followed one more serious problem when a patch was committed to Hive,
> Impala and Spark, and just few days before the release all of them was
> reverted from the projects due to concerns raised by the Spark committee
> (after the changes was already committed to Spark as well)
>
> Having said all of these, I still think that separating the HMS to a new
> top level project could be a step to the right direction with the following
> constraints. The new project should have:
> - Strict, stability oriented branching strategy following Edward's
> suggestions, so if a downstream project - for example Hive - needs some fix
> or easy change that could be incorporated, and released almost immediately.
> So we have to have these:
> - Always releasable head
> - Every multi commit feature should be added as a feature branch
> - Strict, enforced, stability oriented API strategy. So we will not be
> surprised by features added by other projects and break Hive compatibility.
> To avoid this situation we need to design for it, have pre-commit tests in
> place for catch the in-adverted changes, and most importantly have a clear
> commitment for it.
>
> I think, since the current HMS is already used by numerous other projects,
> we already should have these in mind when modifying anything in HMS related
> code. This is not the main focus of Hive, so we do not concentrate on this
> and there are often interoperability issues, problems. We can do this
> inside Hive as well, but the current approach followed by Hive, and the one
> required by the HMS are requiring a different mindset. We need a clear,
> well defined boundary and separating the 2 projects could help in this. We
> can focus on the different needs and goal and eventually we might have
> different culture as well which suits the specific needs of the specific
> part of the code.
>
> I think keeping these rules in the new to level HMS we can mitigate most
> of the issues mentioned below, and we will be better of overall.
> What do you think Edward?
>
> Thanks,
> Peter
>
>
> > On Jul 5, 2017, at 10:16 PM, Xuefu Zhang  wrote:
> >
> > I think Edward's concern is valid. While I voiced my support for this
> > proposal, which was more from the benefits of the whole Hadoop
> ecosystem, I
> > don't see the equal benefits for Hive. Instead, it may even create more
> > overhead for Hive. I'd really like to take time to see what are the road
> > blocks for other projects to use HMS as it is. The issue of Spark
> including
> > a Hive fork, which was brought up some time back, is certainly not one of
> > them.
> >
> > Thanks,
> > Xuefu
> >
> > On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
> > wrote:
> >
> >> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-06 Thread Peter Vary
Hi folks,

I agree with most of the things Edward said. I have faced similar issues in 
smaller scale when integrated Hive with Yetus. We are forced to keep patched 
Yetus files in Hive repo until they push their next release. Also followed one 
more serious problem when a patch was committed to Hive, Impala and Spark, and 
just few days before the release all of them was reverted from the projects due 
to concerns raised by the Spark committee (after the changes was already 
committed to Spark as well)

Having said all of these, I still think that separating the HMS to a new top 
level project could be a step to the right direction with the following 
constraints. The new project should have:
- Strict, stability oriented branching strategy following Edward's suggestions, 
so if a downstream project - for example Hive - needs some fix or easy change 
that could be incorporated, and released almost immediately. So we have to have 
these:
- Always releasable head
- Every multi commit feature should be added as a feature branch
- Strict, enforced, stability oriented API strategy. So we will not be 
surprised by features added by other projects and break Hive compatibility. To 
avoid this situation we need to design for it, have pre-commit tests in place 
for catch the in-adverted changes, and most importantly have a clear commitment 
for it.

I think, since the current HMS is already used by numerous other projects, we 
already should have these in mind when modifying anything in HMS related code. 
This is not the main focus of Hive, so we do not concentrate on this and there 
are often interoperability issues, problems. We can do this inside Hive as 
well, but the current approach followed by Hive, and the one required by the 
HMS are requiring a different mindset. We need a clear, well defined boundary 
and separating the 2 projects could help in this. We can focus on the different 
needs and goal and eventually we might have different culture as well which 
suits the specific needs of the specific part of the code.

I think keeping these rules in the new to level HMS we can mitigate most of the 
issues mentioned below, and we will be better of overall.
What do you think Edward?

Thanks,
Peter

  
> On Jul 5, 2017, at 10:16 PM, Xuefu Zhang  wrote:
> 
> I think Edward's concern is valid. While I voiced my support for this
> proposal, which was more from the benefits of the whole Hadoop ecosystem, I
> don't see the equal benefits for Hive. Instead, it may even create more
> overhead for Hive. I'd really like to take time to see what are the road
> blocks for other projects to use HMS as it is. The issue of Spark including
> a Hive fork, which was brought up some time back, is certainly not one of
> them.
> 
> Thanks,
> Xuefu
> 
> On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
> wrote:
> 
>> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates  wrote:
>> 
>>> On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo 
>>> wrote:
>>> 
 
 We already have things in the meta-store not directly tied to language
 features. For example hive metastore has a "retention" property which
>> is
 not actively in use by anything. In reality, we rarely say 'no' or -1
>> to
 much. Which in part is why I believe our release process is grinding
 slower: we have so many things in flight I do not feel that any one
>>> person
 can keep track. You are working on porting the metastore to hbase.
 https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
>> 'No'
 along the way? When I first noticed this I pointed out that someone has
 already ported the metastore to Cassandra
 https://github.com/riptano/brisk/blob/master/src/java/
 src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
>>> Service.java,
 but I was more exciting/rational for this multi-year approach using
>> hbase
 so I let everyone 'have at it'.
 
>>> Your example and mine are not equivalent.  The HBase metastore is still a
>>> Hive feature, even if some thought it not worth while.  That is different
>>> than people bringing features that will never interest Hive or that Hive
>>> could never use (e.g. Dain’s desire for the metastore to support Presto
>>> style views).
>>> 
>>> I forgot to mention the issue these would be non-Hive contributors have
>>> with releases if they contribute their features to the metastore while
>> it’s
>>> inside Hive.  Is Hive going to do a release just to push out features in
>>> the metastore that it doesn’t care about?
>>> 
>>> You seem to be asserting that doing this doesn’t really help non-Hive
>> based
>>> systems that are using or would like to use the metastore.  But it is
>>> interesting that people from three of those systems have commented in the
>>> thread so far, and all are positive (Dmitrias from Impala, Dain from
>>> Presto, and Sriharsha from the schema registry 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-05 Thread Xuefu Zhang
I think Edward's concern is valid. While I voiced my support for this
proposal, which was more from the benefits of the whole Hadoop ecosystem, I
don't see the equal benefits for Hive. Instead, it may even create more
overhead for Hive. I'd really like to take time to see what are the road
blocks for other projects to use HMS as it is. The issue of Spark including
a Hive fork, which was brought up some time back, is certainly not one of
them.

Thanks,
Xuefu

On Wed, Jul 5, 2017 at 12:33 PM, Edward Capriolo 
wrote:

> On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates  wrote:
>
> > On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo 
> > wrote:
> >
> > >
> > > We already have things in the meta-store not directly tied to language
> > > features. For example hive metastore has a "retention" property which
> is
> > > not actively in use by anything. In reality, we rarely say 'no' or -1
> to
> > > much. Which in part is why I believe our release process is grinding
> > > slower: we have so many things in flight I do not feel that any one
> > person
> > > can keep track. You are working on porting the metastore to hbase.
> > > https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or
> 'No'
> > > along the way? When I first noticed this I pointed out that someone has
> > > already ported the metastore to Cassandra
> > > https://github.com/riptano/brisk/blob/master/src/java/
> > > src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> > Service.java,
> > > but I was more exciting/rational for this multi-year approach using
> hbase
> > > so I let everyone 'have at it'.
> > >
> > Your example and mine are not equivalent.  The HBase metastore is still a
> > Hive feature, even if some thought it not worth while.  That is different
> > than people bringing features that will never interest Hive or that Hive
> > could never use (e.g. Dain’s desire for the metastore to support Presto
> > style views).
> >
> > I forgot to mention the issue these would be non-Hive contributors have
> > with releases if they contribute their features to the metastore while
> it’s
> > inside Hive.  Is Hive going to do a release just to push out features in
> > the metastore that it doesn’t care about?
> >
> > You seem to be asserting that doing this doesn’t really help non-Hive
> based
> > systems that are using or would like to use the metastore.  But it is
> > interesting that people from three of those systems have commented in the
> > thread so far, and all are positive (Dmitrias from Impala, Dain from
> > Presto, and Sriharsha from the schema registry project).
> >
> >
> > > I am going to give a hypothetical but real world situation. Suppose I
> > want
> > > to add the statement "CREATE permanent macro xyz", this feature I
> believe
> > > would cross cut calcite, hive, and hive metastore. To build this
> feature
> > I
> > > would need to orchestrate the change across 3 separate groups of hive
> > > 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> > > releases. That is not counting if we run into some bug or misfeature
> > (maybe
> > > with Tez or something else) so that brings in 4-5 releases of upstream
> to
> > > add a feature to hive. This does not take into account normal processes
> > > mess ups. For example say you get the metastore done, but now the
> people
> > > doing the calcite/antlr suggest the feature have different syntax
> because
> > > they did not read the 3-4 linked tickets when the process started? Now,
> > you
> > > have to loop back around the process. Finding 1 person in 1 project to
> > > usher along the feature you want is difficult, having to find and clear
> > > time with 3 people across three projects is going to be a difficult
> along
> > > with then 'pushing' them all to kick out a release so you can finally
> use
> > > said feature.
> > >
> >
> > I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
> > adds much, if any, overhead.  Hive is a big project and no one person
> knows
> > all the code anymore.  If you wanted to add a permanent macros feature
> you
> > would need reviews from someone who knows the parser (probably
> Pengcheng),
> > people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
> > the metastore (me, Thejas, …).  And any large feature is going to be
> > implemented over multiple JIRAs, all of which are linkable regardless of
> > whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
> > makes the feature disagreement any worse.  If the optimizer team
> absolutely
> > insists it has to have some feature and the metastore team insists that
> it
> > can’t have that feature you’re going to have to work through the issue
> > whether they all are in Hive or in two separate projects.
> >
> > Where I agree the split adds cost is releases.  Before your macro feature
> > could go live you need releases from each of the components.  And while
> in

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-05 Thread Edward Capriolo
On Wed, Jul 5, 2017 at 1:51 PM, Alan Gates  wrote:

> On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo 
> wrote:
>
> >
> > We already have things in the meta-store not directly tied to language
> > features. For example hive metastore has a "retention" property which is
> > not actively in use by anything. In reality, we rarely say 'no' or -1 to
> > much. Which in part is why I believe our release process is grinding
> > slower: we have so many things in flight I do not feel that any one
> person
> > can keep track. You are working on porting the metastore to hbase.
> > https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or 'No'
> > along the way? When I first noticed this I pointed out that someone has
> > already ported the metastore to Cassandra
> > https://github.com/riptano/brisk/blob/master/src/java/
> > src/org/apache/cassandra/hadoop/hive/metastore/SchemaManager
> Service.java,
> > but I was more exciting/rational for this multi-year approach using hbase
> > so I let everyone 'have at it'.
> >
> Your example and mine are not equivalent.  The HBase metastore is still a
> Hive feature, even if some thought it not worth while.  That is different
> than people bringing features that will never interest Hive or that Hive
> could never use (e.g. Dain’s desire for the metastore to support Presto
> style views).
>
> I forgot to mention the issue these would be non-Hive contributors have
> with releases if they contribute their features to the metastore while it’s
> inside Hive.  Is Hive going to do a release just to push out features in
> the metastore that it doesn’t care about?
>
> You seem to be asserting that doing this doesn’t really help non-Hive based
> systems that are using or would like to use the metastore.  But it is
> interesting that people from three of those systems have commented in the
> thread so far, and all are positive (Dmitrias from Impala, Dain from
> Presto, and Sriharsha from the schema registry project).
>
>
> > I am going to give a hypothetical but real world situation. Suppose I
> want
> > to add the statement "CREATE permanent macro xyz", this feature I believe
> > would cross cut calcite, hive, and hive metastore. To build this feature
> I
> > would need to orchestrate the change across 3 separate groups of hive
> > 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> > releases. That is not counting if we run into some bug or misfeature
> (maybe
> > with Tez or something else) so that brings in 4-5 releases of upstream to
> > add a feature to hive. This does not take into account normal processes
> > mess ups. For example say you get the metastore done, but now the people
> > doing the calcite/antlr suggest the feature have different syntax because
> > they did not read the 3-4 linked tickets when the process started? Now,
> you
> > have to loop back around the process. Finding 1 person in 1 project to
> > usher along the feature you want is difficult, having to find and clear
> > time with 3 people across three projects is going to be a difficult along
> > with then 'pushing' them all to kick out a release so you can finally use
> > said feature.
> >
>
> I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
> adds much, if any, overhead.  Hive is a big project and no one person knows
> all the code anymore.  If you wanted to add a permanent macros feature you
> would need reviews from someone who knows the parser (probably Pengcheng),
> people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
> the metastore (me, Thejas, …).  And any large feature is going to be
> implemented over multiple JIRAs, all of which are linkable regardless of
> whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
> makes the feature disagreement any worse.  If the optimizer team absolutely
> insists it has to have some feature and the metastore team insists that it
> can’t have that feature you’re going to have to work through the issue
> whether they all are in Hive or in two separate projects.
>
> Where I agree the split adds cost is releases.  Before your macro feature
> could go live you need releases from each of the components.  And while in
> development the components need to use snapshot versions of the other
> components.  My assertion is that the benefits out weigh this cost.
>
> Alan.
>


"You seem to be asserting that doing this doesn’t really help non-Hive based
systems that are using or would like to use the metastore.  But it is
interesting that people from three of those systems have commented in the
thread so far, and all are positive (Dmitrias from Impala, Dain from
Presto, and Sriharsha from the schema registry project)."

I notice that impala has a syntax for caching.

https://www.cloudera.com/documentation/enterprise/5-8-x/topi
cs/impala_perf_hdfs_caching.html

Notice how the cache syntax did not way into Hive? It would make sense if
this feature 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-05 Thread Alan Gates
On Mon, Jul 3, 2017 at 10:17 AM, Dain Sundstrom  wrote:

> +1
>
> I work on Presto and I think this the right direction for our users.  We
> have several users running Presto without Hive and anything we can do to
> help simplify the Metastore experience would be a good help.
>
> When I read proposals like this, one thing I like to see is a vision
> (scope) for the project.  In this case, I’d like to understand if the plan
> is to limit the scope of the system to what Hive can support.  For example,
> the system will clearly support schemas (databases) with tables and views
> as defined by Hive, but will there be support for additional types like a
> Presto view which is incompatible with a Hive views due to the language
> differences?  Currently, in Presto we create a Hive view to reserve a spot
> in the "tables namespace”, and then we put our view data in a table
> properties.  I would like to formalize this kind of system, so if a Hive
> user queries a Presto view, they get a proper error message. I have similar
> concerns about data types, compression, and data organization (e.g.,
> different bucketing strategies).
>

We tried to lay out the scope in the wiki page [1] Details will need to be
worked out by the new project.  But I’ll give you my view on it.  I don’t
see the value of breaking this out of Hive if it isn’t willing to take
non-Hive features.  If it’s still Hive only in it’s focus why pay the cost
of having separate projects?  So, as long as Presto style views don’t break
Hive style views or make the system horribly complicated and someone is
willing to add them, +1.

A related area that we will need to work out is the metastore connection to
the Hive physical layout.  Today, when a user says “create table”, the
metastore creates a directory in HDFS.  This ties the metastore to a Hive
style data layout.  How should that be handled going forward?  We could
assert that having a standard data layout is good, and all users of this
metadata system should use this layout.  We could make the physical
operations pluggable, providing the Hive style operations as an option, but
allowing users to bring others. We could completely remove the physical
operations, leave them all in Hive, and say that any system using this
should do their own physical operations.  I don't like the last option
because it makes it hard to share data across tools, but I can think of pro
and con arguments for the first two.


> Another aspect of this is what is the vision for the specification of the
> Metastore.  Is the vision to have a very open end-user extensible design
> (e.g., just a name and a bag of properties), or is the vision to have a
> project specified common set properties with “rules” for proper extension?
>

Again, just my opinion, but I would say the latter.  The utility of a name
and a bag of properties turns out to be pretty limited and pretty easy to
implement if that’s all you want.  The current metastore can do a lot more
than that.


>
> I would also be very interested in documentation for the Metastore APIs
> (and can help). We currently reverse engineer proper metastore interaction
> by reading the Hive code, and writing a lot of experimental programs, and I
> would really just like to know the "right way”.  Also, we end up missing
> out on new features in the Metastore due to the work required to understand
> how they work.
>

+1 to better documentation regardless of where the metastore code lives.

Alan.


1. https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-05 Thread Alan Gates
On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo 
wrote:

>
> We already have things in the meta-store not directly tied to language
> features. For example hive metastore has a "retention" property which is
> not actively in use by anything. In reality, we rarely say 'no' or -1 to
> much. Which in part is why I believe our release process is grinding
> slower: we have so many things in flight I do not feel that any one person
> can keep track. You are working on porting the metastore to hbase.
> https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or 'No'
> along the way? When I first noticed this I pointed out that someone has
> already ported the metastore to Cassandra
> https://github.com/riptano/brisk/blob/master/src/java/
> src/org/apache/cassandra/hadoop/hive/metastore/SchemaManagerService.java,
> but I was more exciting/rational for this multi-year approach using hbase
> so I let everyone 'have at it'.
>
Your example and mine are not equivalent.  The HBase metastore is still a
Hive feature, even if some thought it not worth while.  That is different
than people bringing features that will never interest Hive or that Hive
could never use (e.g. Dain’s desire for the metastore to support Presto
style views).

I forgot to mention the issue these would be non-Hive contributors have
with releases if they contribute their features to the metastore while it’s
inside Hive.  Is Hive going to do a release just to push out features in
the metastore that it doesn’t care about?

You seem to be asserting that doing this doesn’t really help non-Hive based
systems that are using or would like to use the metastore.  But it is
interesting that people from three of those systems have commented in the
thread so far, and all are positive (Dmitrias from Impala, Dain from
Presto, and Sriharsha from the schema registry project).


> I am going to give a hypothetical but real world situation. Suppose I want
> to add the statement "CREATE permanent macro xyz", this feature I believe
> would cross cut calcite, hive, and hive metastore. To build this feature I
> would need to orchestrate the change across 3 separate groups of hive
> 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> releases. That is not counting if we run into some bug or misfeature (maybe
> with Tez or something else) so that brings in 4-5 releases of upstream to
> add a feature to hive. This does not take into account normal processes
> mess ups. For example say you get the metastore done, but now the people
> doing the calcite/antlr suggest the feature have different syntax because
> they did not read the 3-4 linked tickets when the process started? Now, you
> have to loop back around the process. Finding 1 person in 1 project to
> usher along the feature you want is difficult, having to find and clear
> time with 3 people across three projects is going to be a difficult along
> with then 'pushing' them all to kick out a release so you can finally use
> said feature.
>

I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
adds much, if any, overhead.  Hive is a big project and no one person knows
all the code anymore.  If you wanted to add a permanent macros feature you
would need reviews from someone who knows the parser (probably Pengcheng),
people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
the metastore (me, Thejas, …).  And any large feature is going to be
implemented over multiple JIRAs, all of which are linkable regardless of
whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
makes the feature disagreement any worse.  If the optimizer team absolutely
insists it has to have some feature and the metastore team insists that it
can’t have that feature you’re going to have to work through the issue
whether they all are in Hive or in two separate projects.

Where I agree the split adds cost is releases.  Before your macro feature
could go live you need releases from each of the components.  And while in
development the components need to use snapshot versions of the other
components.  My assertion is that the benefits out weigh this cost.

Alan.


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-03 Thread Dain Sundstrom
+1

I work on Presto and I think this the right direction for our users.  We have 
several users running Presto without Hive and anything we can do to help 
simplify the Metastore experience would be a good help.

When I read proposals like this, one thing I like to see is a vision (scope) 
for the project.  In this case, I’d like to understand if the plan is to limit 
the scope of the system to what Hive can support.  For example, the system will 
clearly support schemas (databases) with tables and views as defined by Hive, 
but will there be support for additional types like a Presto view which is 
incompatible with a Hive views due to the language differences?  Currently, in 
Presto we create a Hive view to reserve a spot in the "tables namespace”, and 
then we put our view data in a table properties.  I would like to formalize 
this kind of system, so if a Hive user queries a Presto view, they get a proper 
error message. I have similar concerns about data types, compression, and data 
organization (e.g., different bucketing strategies). 

Another aspect of this is what is the vision for the specification of the 
Metastore.  Is the vision to have a very open end-user extensible design (e.g., 
just a name and a bag of properties), or is the vision to have a project 
specified common set properties with “rules” for proper extension?

I would also be very interested in documentation for the Metastore APIs (and 
can help). We currently reverse engineer proper metastore interaction by 
reading the Hive code, and writing a lot of experimental programs, and I would 
really just like to know the "right way”.  Also, we end up missing out on new 
features in the Metastore due to the work required to understand how they work.

-dain

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-03 Thread Eugene Koifman
This rings very true to me

On 7/3/17, 6:20 AM, "Edward Capriolo"  wrote:

I am going to give a hypothetical but real world situation. Suppose I want
to add the statement "CREATE permanent macro xyz", this feature I believe
would cross cut calcite, hive, and hive metastore. To build this feature I
would need to orchestrate the change across 3 separate groups of hive
'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
releases. That is not counting if we run into some bug or misfeature (maybe
with Tez or something else) so that brings in 4-5 releases of upstream to
add a feature to hive. This does not take into account normal processes
mess ups. For example say you get the metastore done, but now the people
doing the calcite/antlr suggest the feature have different syntax because
they did not read the 3-4 linked tickets when the process started? Now, you
have to loop back around the process. Finding 1 person in 1 project to
usher along the feature you want is difficult, having to find and clear
time with 3 people across three projects is going to be a difficult along
with then 'pushing' them all to kick out a release so you can finally use
said feature.



Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-03 Thread Edward Capriolo
On Sun, Jul 2, 2017 at 10:15 PM, Alan Gates  wrote:

> Comments inlined.
>
> On Sun, Jul 2, 2017 at 3:22 PM, Edward Capriolo 
> wrote:
>
> > I am not sure I am on the fence with this.
> >
> > I am -1, and I offer this -1 with the hope of being convinced otherwise
> >
> Thank you for being open to reconsider.
>
> >
> >
> > "By making it a separate project we will enable other projects to join us
> > in
> > innovating on the metastore. "
> >
> > The relevant questions I have are,
> >
> > "What is stopping others from joining us now?"
> > "What does being a TLP do for us that we do not have now?"
> >
>
> Walking through a use case will help answer these.  This is a real world
> situation, not a hypothetical.  I’ve been talking with a team building a
> schema registry for Kafka[1].  I’d like them to use the Hive metastore
> rather than reinvent the wheel.  I believe this would be good for users
> (all their tools can work together on a shared understanding of the data)
> and admins (just one metadata store to administer) and for the ecosytems
> (tools can work across stored data and streaming data).
>
> This system has some requirements on metadata that Hive does not.  To take
> one example, it would like a schema to be a top level concept instead of a
> concept tied to tables or partitions.  This is not a problem for Hive, but
> neither is it interesting.  So if they come with patches for this, would we
> accept them?  As the Hive PMC our answer will be no, because it doesn’t
> help Hive’s metadata.  Even if we accept their patches will we make them
> committers when we know they don’t care about Hive as Hive, but only the
> metastore.  Again, the right answer for the Hive PMC is no.
>
> And we cannot say that Hive should support a generic metadata system within
> itself.  That turns Hive into an umbrella project, which Apache has
> repeatedly worked to avoid.  So Hive will either need to reject non-Hive
> centric features and contributors or end up in a place Apache has worked to
> avoid.
>
> And finally, why would other teams want to mess with all of Hive when they
> only want the metastore?  Hive is a large and complex system.  If we break
> the metastore out it is much more approachable by non-Hive contributors.
>
> Obviously the Hive team doesn’t want to see their metastore turn into
> something unusable by Hive, which is why we were specific in saying we
> wanted it to continue to support high performance SQL systems.
>
> My experience in watching ORC move out of Hive is that the adoption has
> increased significantly.  It is reasonable to assume that moving the
> metastore out will also increase adoption and make it easier for others to
> get involved.
>
>
> > I see a lot of downsides:
> > 1) We have to maintain two sites
> > 2) we have to maintain two committer lists
> >
> > A large problem I see is this: Hive is already being pulled in too many
> > different directions. There is some grumbling about the state of
> > hive-on-spark.
> >
>
> I believe this argues in favor of the split, not against.  By pulling out
> the metastore we are releiving pressure on Hive itself.  Let Hive focus on
> being a SQL engine.  Let another team focus on runtime metadata.
>
> On your committer questions in later emails, the point of going to a TLP
> has nothing to do with adding new committers.   Traditionally new projects
> start in the incubator.  But given that all of the PMC of this new project
> are already experienced Hive PMC members I see no reason to go through
> incubator.  I agree with you that we would not throw any new people into
> the mix.  People join the project in the same way as always, by
> contributing.
>
> Alan.
>
> 1. https://github.com/hortonworks/registry
>
>
> > Most importantly, our release process seems 'injured' by too many
> branches
> > going off in different ways. If the metastore lives outside of Hive we
> are
> > going to compound this issue. I would strongly suggest we do not
> undertake
> > this until we can at least turn out 2 usable releases in a 6 month
> period.
> >
>

So if they come with patches for this, would we
accept them?  As the Hive PMC our answer will be no, because it doesn’t
help Hive’s metadata.

We already have things in the meta-store not directly tied to language
features. For example hive metastore has a "retention" property which is
not actively in use by anything. In reality, we rarely say 'no' or -1 to
much. Which in part is why I believe our release process is grinding
slower: we have so many things in flight I do not feel that any one person
can keep track. You are working on porting the metastore to hbase.
https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or 'No'
along the way? When I first noticed this I pointed out that someone has
already ported the metastore to Cassandra

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-03 Thread Zoltan Haindrich

I think it would be great to move the metastore into a separate project - It 
seems to me that some security related features/apis which are specific to the 
metastore are outside of it…so separating the metastore would definetly mean a 
cleanup in this area as well - which will in turn make the whole system more 
easily understandable.

I feel that the "storage-api” may also find a better home in the new metastore 
project - since it defines an api used by Orc/etc to enable the usage of 
different storage drivers in a somewhat similar way - (however: this might not 
be a good idea, in case we want to use the storage-api module as the ‘declared’ 
api to support further storage drivers)


> On 2017Jul 1,, at 12:38 PM, Chaoyu Tang  wrote:
> 
> +1. A nice move which might help HMS to be more easily adopted by more
> other components.
> 
> On Sat, Jul 1, 2017 at 12:41 AM, Sushanth Sowmyan 
> wrote:
> 
>> +1
>> 
>> On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:
>> 
>>> On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>>> 
 and maybe a different project name?
 
>>> 
>>> Yes, it certainly needs a new name. I'd like to suggest Riven.
>>> 
>>> .. Owen
>>> 
>> 



Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-02 Thread Alan Gates
Comments inlined.

On Sun, Jul 2, 2017 at 3:22 PM, Edward Capriolo 
wrote:

> I am not sure I am on the fence with this.
>
> I am -1, and I offer this -1 with the hope of being convinced otherwise
>
Thank you for being open to reconsider.

>
>
> "By making it a separate project we will enable other projects to join us
> in
> innovating on the metastore. "
>
> The relevant questions I have are,
>
> "What is stopping others from joining us now?"
> "What does being a TLP do for us that we do not have now?"
>

Walking through a use case will help answer these.  This is a real world
situation, not a hypothetical.  I’ve been talking with a team building a
schema registry for Kafka[1].  I’d like them to use the Hive metastore
rather than reinvent the wheel.  I believe this would be good for users
(all their tools can work together on a shared understanding of the data)
and admins (just one metadata store to administer) and for the ecosytems
(tools can work across stored data and streaming data).

This system has some requirements on metadata that Hive does not.  To take
one example, it would like a schema to be a top level concept instead of a
concept tied to tables or partitions.  This is not a problem for Hive, but
neither is it interesting.  So if they come with patches for this, would we
accept them?  As the Hive PMC our answer will be no, because it doesn’t
help Hive’s metadata.  Even if we accept their patches will we make them
committers when we know they don’t care about Hive as Hive, but only the
metastore.  Again, the right answer for the Hive PMC is no.

And we cannot say that Hive should support a generic metadata system within
itself.  That turns Hive into an umbrella project, which Apache has
repeatedly worked to avoid.  So Hive will either need to reject non-Hive
centric features and contributors or end up in a place Apache has worked to
avoid.

And finally, why would other teams want to mess with all of Hive when they
only want the metastore?  Hive is a large and complex system.  If we break
the metastore out it is much more approachable by non-Hive contributors.

Obviously the Hive team doesn’t want to see their metastore turn into
something unusable by Hive, which is why we were specific in saying we
wanted it to continue to support high performance SQL systems.

My experience in watching ORC move out of Hive is that the adoption has
increased significantly.  It is reasonable to assume that moving the
metastore out will also increase adoption and make it easier for others to
get involved.


> I see a lot of downsides:
> 1) We have to maintain two sites
> 2) we have to maintain two committer lists
>
> A large problem I see is this: Hive is already being pulled in too many
> different directions. There is some grumbling about the state of
> hive-on-spark.
>

I believe this argues in favor of the split, not against.  By pulling out
the metastore we are releiving pressure on Hive itself.  Let Hive focus on
being a SQL engine.  Let another team focus on runtime metadata.

On your committer questions in later emails, the point of going to a TLP
has nothing to do with adding new committers.   Traditionally new projects
start in the incubator.  But given that all of the PMC of this new project
are already experienced Hive PMC members I see no reason to go through
incubator.  I agree with you that we would not throw any new people into
the mix.  People join the project in the same way as always, by
contributing.

Alan.

1. https://github.com/hortonworks/registry


> Most importantly, our release process seems 'injured' by too many branches
> going off in different ways. If the metastore lives outside of Hive we are
> going to compound this issue. I would strongly suggest we do not undertake
> this until we can at least turn out 2 usable releases in a 6 month period.
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-02 Thread Edward Capriolo
"I do not know how this works for TLP proposals, but I also do not think
the TLP process will "open" anything new up for you. IE I do not think the
proposal will grant anyone a free ride seat on the commiter/pmc list (I
surely would not support that"

I was unclear, I did not mean "you" or "anyone" as a statement to a
particular person in this chain. I meant that: Forming a TLP should not
directly increase the commiter/pmc list to anyone not currently in the Hive
pmc/committer list.

On Sun, Jul 2, 2017 at 6:50 PM, Edward Capriolo 
wrote:

>
>
> On Fri, Jun 30, 2017 at 2:49 PM, Julian Hyde  wrote:
>
>> +1
>>
>> As a Calcite PMC member, I am very pleased to see this change. Calcite
>> reads metadata from a variety of sources (including JDBC databases, NoSQL
>> databases such as Cassandra and Druid, and streaming systems), and if more
>> of those sources choose to store their metadata in the metastore it will
>> make our lives easier.
>>
>> Hive’s metastore has established a position as the place to go for
>> metadata in the Hadoop ecosystem. Not all metadata is relational, or
>> processed by Hive, so there are other parties using the metastore who
>> justifiably would like to influence its direction. Opening up the metastore
>> will help retain and extend this position.
>>
>> Julian
>>
>>
>> On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
>> >
>> >
>> > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
>> > > A few of us have been talking and come to the conclussion that it
>> would be>
>> > > a good thing to split out the Hive metastore into its own Apache
>> project.>
>> > > Below and in the linked wiki page we explain what we see as the
>> advantages>
>> > > to this and how we would go about it.>
>> > > >
>> > > Hive’s metastore has long been used by other projects in the Hadoop>
>> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
>> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
>> Some,>
>> > > like Impala and Presto can use it as their own metadata system with
>> the>
>> > > rest of Hive not present.>
>> > > >
>> > > This sharing is excellent for the ecosystem.  Together with HDFS it
>> allows>
>> > > users to use the tool of their choice while still accessing the same
>> shared>
>> > > data.  But having this shared metadata inside the Hive project limits
>> the>
>> > > ability of other projects to contribute to the metastore.  It also
>> makes it>
>> > > harder for new systems that have similar but not identical metadata>
>> > > requirements (for example, stream processing systems on top of Apache>
>> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
>> comes>
>> > > out in two ways.  One, it is hard for non-Hive community members to>
>> > > participate in the project.  Second, it adds operational cost since
>> users>
>> > > are forced to deploy all of the Hive jars just to get the metastore
>> to work.>
>> > > >
>> > > Therefore we propose to split Hive’s metastore out into a separate
>> Apache>
>> > > project.  This new project will continue to support the same Thrift
>> API as>
>> > > the current metastore.  It will continue to focus on being a high>
>> > > performance, fault tolerant, large scale, operational metastore for
>> SQL>
>> > > engines and other systems that want to store schema information about
>> their>
>> > > data.>
>> > > >
>> > > By making it a separate project we will enable other projects to join
>> us in>
>> > > innovating on the metastore.  It will simplify operations for
>> non-Hive>
>> > > users that want to use the metastore as they will no longer need to
>> install>
>> > > Hive just to get the metastore.  And it will attract new projects
>> that>
>> > > might otherwise feel the need to solve their metadata problems on
>> their own.>
>> > > >
>> > > Any Hive PMC member or committer will be welcome to join the new
>> project at>
>> > > the same level.  We propose this project go straight to a top level>
>> > > project.  Given that the initial PMC will be formed from experienced
>> Hive>
>> > > PMC members we do not believe incubation will be necessary.  (Note
>> that the>
>> > > Apache board will need to approve this.)>
>> > > >
>> > > Obviously there a many details involved in a proposal like this.
>> Rather>
>> > > than make this a ten page email we have filled out many of the
>> details in a>
>> > > wiki page:>
>> > > https://cwiki.apache.org/confluence/display/Hive/Metastore+
>> TLP+Proposal>
>> > > >
>> > > Yongzhi Chen>
>> > > Vihang Karajgaonkar>
>> > > Sergio Pena>
>> > > Sahil Takiar>
>> > > Aihua Xu>
>> > > Gunther Hagleitner>
>> > > Thejas Nair>
>> > > Alan Gates>
>> > > >
>> >
>> > +1 (from Apache Impala's (incubating) perspective)>
>> >
>> > Dimitris>
>> >
>
>
>
> "Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-02 Thread Edward Capriolo
On Fri, Jun 30, 2017 at 2:49 PM, Julian Hyde  wrote:

> +1
>
> As a Calcite PMC member, I am very pleased to see this change. Calcite
> reads metadata from a variety of sources (including JDBC databases, NoSQL
> databases such as Cassandra and Druid, and streaming systems), and if more
> of those sources choose to store their metadata in the metastore it will
> make our lives easier.
>
> Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by Hive, so there are other parties using the metastore who
> justifiably would like to influence its direction. Opening up the metastore
> will help retain and extend this position.
>
> Julian
>
>
> On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> >
> >
> > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > A few of us have been talking and come to the conclussion that it
> would be>
> > > a good thing to split out the Hive metastore into its own Apache
> project.>
> > > Below and in the linked wiki page we explain what we see as the
> advantages>
> > > to this and how we would go about it.>
> > > >
> > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> Some,>
> > > like Impala and Presto can use it as their own metadata system with
> the>
> > > rest of Hive not present.>
> > > >
> > > This sharing is excellent for the ecosystem.  Together with HDFS it
> allows>
> > > users to use the tool of their choice while still accessing the same
> shared>
> > > data.  But having this shared metadata inside the Hive project limits
> the>
> > > ability of other projects to contribute to the metastore.  It also
> makes it>
> > > harder for new systems that have similar but not identical metadata>
> > > requirements (for example, stream processing systems on top of Apache>
> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> comes>
> > > out in two ways.  One, it is hard for non-Hive community members to>
> > > participate in the project.  Second, it adds operational cost since
> users>
> > > are forced to deploy all of the Hive jars just to get the metastore to
> work.>
> > > >
> > > Therefore we propose to split Hive’s metastore out into a separate
> Apache>
> > > project.  This new project will continue to support the same Thrift
> API as>
> > > the current metastore.  It will continue to focus on being a high>
> > > performance, fault tolerant, large scale, operational metastore for
> SQL>
> > > engines and other systems that want to store schema information about
> their>
> > > data.>
> > > >
> > > By making it a separate project we will enable other projects to join
> us in>
> > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > users that want to use the metastore as they will no longer need to
> install>
> > > Hive just to get the metastore.  And it will attract new projects that>
> > > might otherwise feel the need to solve their metadata problems on
> their own.>
> > > >
> > > Any Hive PMC member or committer will be welcome to join the new
> project at>
> > > the same level.  We propose this project go straight to a top level>
> > > project.  Given that the initial PMC will be formed from experienced
> Hive>
> > > PMC members we do not believe incubation will be necessary.  (Note
> that the>
> > > Apache board will need to approve this.)>
> > > >
> > > Obviously there a many details involved in a proposal like this.
> Rather>
> > > than make this a ten page email we have filled out many of the details
> in a>
> > > wiki page:>
> > > https://cwiki.apache.org/confluence/display/Hive/
> Metastore+TLP+Proposal>
> > > >
> > > Yongzhi Chen>
> > > Vihang Karajgaonkar>
> > > Sergio Pena>
> > > Sahil Takiar>
> > > Aihua Xu>
> > > Gunther Hagleitner>
> > > Thejas Nair>
> > > Alan Gates>
> > > >
> >
> > +1 (from Apache Impala's (incubating) perspective)>
> >
> > Dimitris>
> >



"Hive’s metastore has established a position as the place to go for
metadata in the Hadoop ecosystem. Not all metadata is relational, or
processed by Hive, so there are other parties using the metastore who
justifiably would like to influence its direction. Opening up the metastore
will help retain and extend this position."

The metastore is open and parties can influence its direction. Meritocracy
is earned.

For example: I have seem several parties state they wish Hive metastore was
packaged such that it was easier to embed/include. However, no one has
opened a ticket and completed/started/seriously scoped out that work. I do
not see moving to a TLP and giving the code a new name will drive people to
take that next step.

I do not know how this works for TLP proposals, but I also do not think the
TLP process will "open" anything new up 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-02 Thread Edward Capriolo
I am not sure I am on the fence with this.

On Sat, Jul 1, 2017 at 6:38 AM, Chaoyu Tang  wrote:

> +1. A nice move which might help HMS to be more easily adopted by more
> other components.
>
> On Sat, Jul 1, 2017 at 12:41 AM, Sushanth Sowmyan 
> wrote:
>
> > +1
> >
> > On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:
> >
> > > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> > >
> > > > and maybe a different project name?
> > > >
> > >
> > > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > >
> > > .. Owen
> > >
> >
>

I am -1, and I offer this -1 with the hope of being convinced otherwise

"By making it a separate project we will enable other projects to join us in
innovating on the metastore. "

The relevant questions I have are,

"What is stopping others from joining us now?"
"What does being a TLP do for us that we do not have now?"

I see a lot of downsides:
1) We have to maintain two sites
2) we have to maintain two committer lists

A large problem I see is this: Hive is already being pulled in too many
different directions. There is some grumbling about the state of
hive-on-spark.

Most importantly, our release process seems 'injured' by too many branches
going off in different ways. If the metastore lives outside of Hive we are
going to compound this issue. I would strongly suggest we do not undertake
this until we can at least turn out 2 usable releases in a 6 month period.


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-01 Thread Chaoyu Tang
+1. A nice move which might help HMS to be more easily adopted by more
other components.

On Sat, Jul 1, 2017 at 12:41 AM, Sushanth Sowmyan 
wrote:

> +1
>
> On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:
>
> > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> >
> > > and maybe a different project name?
> > >
> >
> > Yes, it certainly needs a new name. I'd like to suggest Riven.
> >
> > .. Owen
> >
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Sushanth Sowmyan
+1

On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:

> On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>
> > and maybe a different project name?
> >
>
> Yes, it certainly needs a new name. I'd like to suggest Riven.
>
> .. Owen
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Owen O'Malley
On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:

> and maybe a different project name?
>

Yes, it certainly needs a new name. I'd like to suggest Riven.

.. Owen


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Jimmy Xiang
Yeah, this is good idea. +1

On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> HMS has become the shared catalog service for multiple projects outside
> Hive,
> so +1 on this move (and maybe a different project name?).
>
> On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley 
> wrote:
>
>> I'm +1 on separating out the metastore. It recognizes the reality that a
>> lot of different projects use the Hive Metastore and opening up the
>> community is a great move.
>>
>> ..Owen
>>
>> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:
>>
>> > +1, sounds like a good idea!
>> >
>> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
>> >
>> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
>> > > This is a great opportunity for building a Metastore to not only
>> address
>> > > schemas for the data at rest but also for the data in motion. We have a
>> > > SchemaRegistry (http://github.com/hortonworks/registry)  project that
>> > > allows users to register schemas for data in motion and integrates with
>> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
>> > > us with opportunity to integrate our apis with Hive Metastore and
>> > > provide with one project that is truly a single metastore that can hold
>> > > all schemas.
>> > >
>> > > Thanks,
>> > > Harsha
>> > >
>> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
>> > > > Great, thanks Alan for putting all this in the email.
>> > > > +1
>> > > >
>> > > > Allowing other components to continue to use the Metastore without
>> the
>> > > > need
>> > > > to use Hive dependencies is a big plus for them. I agree with
>> > everything
>> > > > you mention on the email.
>> > > >
>> > > > - Sergio
>> > > >
>> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde 
>> wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > As a Calcite PMC member, I am very pleased to see this change.
>> > Calcite
>> > > > > reads metadata from a variety of sources (including JDBC databases,
>> > > NoSQL
>> > > > > databases such as Cassandra and Druid, and streaming systems), and
>> if
>> > > more
>> > > > > of those sources choose to store their metadata in the metastore it
>> > > will
>> > > > > make our lives easier.
>> > > > >
>> > > > > Hive’s metastore has established a position as the place to go for
>> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational,
>> or
>> > > > > processed by Hive, so there are other parties using the metastore
>> who
>> > > > > justifiably would like to influence its direction. Opening up the
>> > > metastore
>> > > > > will help retain and extend this position.
>> > > > >
>> > > > > Julian
>> > > > >
>> > > > >
>> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
>> > > > > >
>> > > > > >
>> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates 
>> wrote: >
>> > > > > > > A few of us have been talking and come to the conclussion that
>> it
>> > > > > would be>
>> > > > > > > a good thing to split out the Hive metastore into its own
>> Apache
>> > > > > project.>
>> > > > > > > Below and in the linked wiki page we explain what we see as the
>> > > > > advantages>
>> > > > > > > to this and how we would go about it.>
>> > > > > > > >
>> > > > > > > Hive’s metastore has long been used by other projects in the
>> > > Hadoop>
>> > > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
>> > > Spark,>
>> > > > > > > Apache Drill, Presto, and other systems all use Hive’s
>> metastore.
>> > > > > Some,>
>> > > > > > > like Impala and Presto can use it as their own metadata system
>> > with
>> > > > > the>
>> > > > > > > rest of Hive not present.>
>> > > > > > > >
>> > > > > > > This sharing is excellent for the ecosystem.  Together with
>> HDFS
>> > it
>> > > > > allows>
>> > > > > > > users to use the tool of their choice while still accessing the
>> > > same
>> > > > > shared>
>> > > > > > > data.  But having this shared metadata inside the Hive project
>> > > limits
>> > > > > the>
>> > > > > > > ability of other projects to contribute to the metastore.  It
>> > also
>> > > > > makes it>
>> > > > > > > harder for new systems that have similar but not identical
>> > > metadata>
>> > > > > > > requirements (for example, stream processing systems on top of
>> > > Apache>
>> > > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
>> > systems
>> > > > > comes>
>> > > > > > > out in two ways.  One, it is hard for non-Hive community
>> members
>> > > to>
>> > > > > > > participate in the project.  Second, it adds operational cost
>> > since
>> > > > > users>
>> > > > > > > are forced to deploy all of the Hive jars just to get the
>> > > metastore to
>> > > > > work.>
>> > > > > > > >
>> > > > > > > Therefore we propose to split Hive’s metastore out into a
>> > separate
>> > > > > Apache>
>> > > > > > > project.  This new project will 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Chao Sun
HMS has become the shared catalog service for multiple projects outside
Hive,
so +1 on this move (and maybe a different project name?).

On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley 
wrote:

> I'm +1 on separating out the metastore. It recognizes the reality that a
> lot of different projects use the Hive Metastore and opening up the
> community is a great move.
>
> ..Owen
>
> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:
>
> > +1, sounds like a good idea!
> >
> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
> >
> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> > > This is a great opportunity for building a Metastore to not only
> address
> > > schemas for the data at rest but also for the data in motion. We have a
> > > SchemaRegistry (http://github.com/hortonworks/registry)  project that
> > > allows users to register schemas for data in motion and integrates with
> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> > > us with opportunity to integrate our apis with Hive Metastore and
> > > provide with one project that is truly a single metastore that can hold
> > > all schemas.
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > > > Great, thanks Alan for putting all this in the email.
> > > > +1
> > > >
> > > > Allowing other components to continue to use the Metastore without
> the
> > > > need
> > > > to use Hive dependencies is a big plus for them. I agree with
> > everything
> > > > you mention on the email.
> > > >
> > > > - Sergio
> > > >
> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde 
> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > As a Calcite PMC member, I am very pleased to see this change.
> > Calcite
> > > > > reads metadata from a variety of sources (including JDBC databases,
> > > NoSQL
> > > > > databases such as Cassandra and Druid, and streaming systems), and
> if
> > > more
> > > > > of those sources choose to store their metadata in the metastore it
> > > will
> > > > > make our lives easier.
> > > > >
> > > > > Hive’s metastore has established a position as the place to go for
> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational,
> or
> > > > > processed by Hive, so there are other parties using the metastore
> who
> > > > > justifiably would like to influence its direction. Opening up the
> > > metastore
> > > > > will help retain and extend this position.
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > > > >
> > > > > >
> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates 
> wrote: >
> > > > > > > A few of us have been talking and come to the conclussion that
> it
> > > > > would be>
> > > > > > > a good thing to split out the Hive metastore into its own
> Apache
> > > > > project.>
> > > > > > > Below and in the linked wiki page we explain what we see as the
> > > > > advantages>
> > > > > > > to this and how we would go about it.>
> > > > > > > >
> > > > > > > Hive’s metastore has long been used by other projects in the
> > > Hadoop>
> > > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> > > Spark,>
> > > > > > > Apache Drill, Presto, and other systems all use Hive’s
> metastore.
> > > > > Some,>
> > > > > > > like Impala and Presto can use it as their own metadata system
> > with
> > > > > the>
> > > > > > > rest of Hive not present.>
> > > > > > > >
> > > > > > > This sharing is excellent for the ecosystem.  Together with
> HDFS
> > it
> > > > > allows>
> > > > > > > users to use the tool of their choice while still accessing the
> > > same
> > > > > shared>
> > > > > > > data.  But having this shared metadata inside the Hive project
> > > limits
> > > > > the>
> > > > > > > ability of other projects to contribute to the metastore.  It
> > also
> > > > > makes it>
> > > > > > > harder for new systems that have similar but not identical
> > > metadata>
> > > > > > > requirements (for example, stream processing systems on top of
> > > Apache>
> > > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
> > systems
> > > > > comes>
> > > > > > > out in two ways.  One, it is hard for non-Hive community
> members
> > > to>
> > > > > > > participate in the project.  Second, it adds operational cost
> > since
> > > > > users>
> > > > > > > are forced to deploy all of the Hive jars just to get the
> > > metastore to
> > > > > work.>
> > > > > > > >
> > > > > > > Therefore we propose to split Hive’s metastore out into a
> > separate
> > > > > Apache>
> > > > > > > project.  This new project will continue to support the same
> > Thrift
> > > > > API as>
> > > > > > > the current metastore.  It will continue to focus on being a
> > high>
> > > > > > > performance, fault tolerant, large scale, operational metastore
> > for
> > > > > SQL>
> 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Owen O'Malley
I'm +1 on separating out the metastore. It recognizes the reality that a
lot of different projects use the Hive Metastore and opening up the
community is a great move.

..Owen

On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:

> +1, sounds like a good idea!
>
> On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
>
> > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> > This is a great opportunity for building a Metastore to not only address
> > schemas for the data at rest but also for the data in motion. We have a
> > SchemaRegistry (http://github.com/hortonworks/registry)  project that
> > allows users to register schemas for data in motion and integrates with
> > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> > us with opportunity to integrate our apis with Hive Metastore and
> > provide with one project that is truly a single metastore that can hold
> > all schemas.
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > > Great, thanks Alan for putting all this in the email.
> > > +1
> > >
> > > Allowing other components to continue to use the Metastore without the
> > > need
> > > to use Hive dependencies is a big plus for them. I agree with
> everything
> > > you mention on the email.
> > >
> > > - Sergio
> > >
> > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> > >
> > > > +1
> > > >
> > > > As a Calcite PMC member, I am very pleased to see this change.
> Calcite
> > > > reads metadata from a variety of sources (including JDBC databases,
> > NoSQL
> > > > databases such as Cassandra and Druid, and streaming systems), and if
> > more
> > > > of those sources choose to store their metadata in the metastore it
> > will
> > > > make our lives easier.
> > > >
> > > > Hive’s metastore has established a position as the place to go for
> > > > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > > > processed by Hive, so there are other parties using the metastore who
> > > > justifiably would like to influence its direction. Opening up the
> > metastore
> > > > will help retain and extend this position.
> > > >
> > > > Julian
> > > >
> > > >
> > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > > >
> > > > >
> > > > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > > > A few of us have been talking and come to the conclussion that it
> > > > would be>
> > > > > > a good thing to split out the Hive metastore into its own Apache
> > > > project.>
> > > > > > Below and in the linked wiki page we explain what we see as the
> > > > advantages>
> > > > > > to this and how we would go about it.>
> > > > > > >
> > > > > > Hive’s metastore has long been used by other projects in the
> > Hadoop>
> > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> > Spark,>
> > > > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > > > Some,>
> > > > > > like Impala and Presto can use it as their own metadata system
> with
> > > > the>
> > > > > > rest of Hive not present.>
> > > > > > >
> > > > > > This sharing is excellent for the ecosystem.  Together with HDFS
> it
> > > > allows>
> > > > > > users to use the tool of their choice while still accessing the
> > same
> > > > shared>
> > > > > > data.  But having this shared metadata inside the Hive project
> > limits
> > > > the>
> > > > > > ability of other projects to contribute to the metastore.  It
> also
> > > > makes it>
> > > > > > harder for new systems that have similar but not identical
> > metadata>
> > > > > > requirements (for example, stream processing systems on top of
> > Apache>
> > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
> systems
> > > > comes>
> > > > > > out in two ways.  One, it is hard for non-Hive community members
> > to>
> > > > > > participate in the project.  Second, it adds operational cost
> since
> > > > users>
> > > > > > are forced to deploy all of the Hive jars just to get the
> > metastore to
> > > > work.>
> > > > > > >
> > > > > > Therefore we propose to split Hive’s metastore out into a
> separate
> > > > Apache>
> > > > > > project.  This new project will continue to support the same
> Thrift
> > > > API as>
> > > > > > the current metastore.  It will continue to focus on being a
> high>
> > > > > > performance, fault tolerant, large scale, operational metastore
> for
> > > > SQL>
> > > > > > engines and other systems that want to store schema information
> > about
> > > > their>
> > > > > > data.>
> > > > > > >
> > > > > > By making it a separate project we will enable other projects to
> > join
> > > > us in>
> > > > > > innovating on the metastore.  It will simplify operations for
> > non-Hive>
> > > > > > users that want to use the metastore as they will no longer need
> to
> > > > install>
> > > > > > Hive just to get the metastore.  And it will 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Xuefu Zhang
+1, sounds like a good idea!

On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:

> Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> This is a great opportunity for building a Metastore to not only address
> schemas for the data at rest but also for the data in motion. We have a
> SchemaRegistry (http://github.com/hortonworks/registry)  project that
> allows users to register schemas for data in motion and integrates with
> Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> us with opportunity to integrate our apis with Hive Metastore and
> provide with one project that is truly a single metastore that can hold
> all schemas.
>
> Thanks,
> Harsha
>
> On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > Great, thanks Alan for putting all this in the email.
> > +1
> >
> > Allowing other components to continue to use the Metastore without the
> > need
> > to use Hive dependencies is a big plus for them. I agree with everything
> > you mention on the email.
> >
> > - Sergio
> >
> > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> >
> > > +1
> > >
> > > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > > reads metadata from a variety of sources (including JDBC databases,
> NoSQL
> > > databases such as Cassandra and Druid, and streaming systems), and if
> more
> > > of those sources choose to store their metadata in the metastore it
> will
> > > make our lives easier.
> > >
> > > Hive’s metastore has established a position as the place to go for
> > > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > > processed by Hive, so there are other parties using the metastore who
> > > justifiably would like to influence its direction. Opening up the
> metastore
> > > will help retain and extend this position.
> > >
> > > Julian
> > >
> > >
> > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > >
> > > >
> > > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > > A few of us have been talking and come to the conclussion that it
> > > would be>
> > > > > a good thing to split out the Hive metastore into its own Apache
> > > project.>
> > > > > Below and in the linked wiki page we explain what we see as the
> > > advantages>
> > > > > to this and how we would go about it.>
> > > > > >
> > > > > Hive’s metastore has long been used by other projects in the
> Hadoop>
> > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> Spark,>
> > > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > > Some,>
> > > > > like Impala and Presto can use it as their own metadata system with
> > > the>
> > > > > rest of Hive not present.>
> > > > > >
> > > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > > allows>
> > > > > users to use the tool of their choice while still accessing the
> same
> > > shared>
> > > > > data.  But having this shared metadata inside the Hive project
> limits
> > > the>
> > > > > ability of other projects to contribute to the metastore.  It also
> > > makes it>
> > > > > harder for new systems that have similar but not identical
> metadata>
> > > > > requirements (for example, stream processing systems on top of
> Apache>
> > > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > > comes>
> > > > > out in two ways.  One, it is hard for non-Hive community members
> to>
> > > > > participate in the project.  Second, it adds operational cost since
> > > users>
> > > > > are forced to deploy all of the Hive jars just to get the
> metastore to
> > > work.>
> > > > > >
> > > > > Therefore we propose to split Hive’s metastore out into a separate
> > > Apache>
> > > > > project.  This new project will continue to support the same Thrift
> > > API as>
> > > > > the current metastore.  It will continue to focus on being a high>
> > > > > performance, fault tolerant, large scale, operational metastore for
> > > SQL>
> > > > > engines and other systems that want to store schema information
> about
> > > their>
> > > > > data.>
> > > > > >
> > > > > By making it a separate project we will enable other projects to
> join
> > > us in>
> > > > > innovating on the metastore.  It will simplify operations for
> non-Hive>
> > > > > users that want to use the metastore as they will no longer need to
> > > install>
> > > > > Hive just to get the metastore.  And it will attract new projects
> that>
> > > > > might otherwise feel the need to solve their metadata problems on
> > > their own.>
> > > > > >
> > > > > Any Hive PMC member or committer will be welcome to join the new
> > > project at>
> > > > > the same level.  We propose this project go straight to a top
> level>
> > > > > project.  Given that the initial PMC will be formed from
> experienced
> > > Hive>
> > > > > PMC members we do not believe incubation will be necessary.  (Note
> > > that the>
> > > > > 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Harsha
Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
This is a great opportunity for building a Metastore to not only address
schemas for the data at rest but also for the data in motion. We have a
SchemaRegistry (http://github.com/hortonworks/registry)  project that
allows users to register schemas for data in motion and integrates with
Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
us with opportunity to integrate our apis with Hive Metastore and
provide with one project that is truly a single metastore that can hold
all schemas. 

Thanks,
Harsha

On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> Great, thanks Alan for putting all this in the email.
> +1
> 
> Allowing other components to continue to use the Metastore without the
> need
> to use Hive dependencies is a big plus for them. I agree with everything
> you mention on the email.
> 
> - Sergio
> 
> On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> 
> > +1
> >
> > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > reads metadata from a variety of sources (including JDBC databases, NoSQL
> > databases such as Cassandra and Druid, and streaming systems), and if more
> > of those sources choose to store their metadata in the metastore it will
> > make our lives easier.
> >
> > Hive’s metastore has established a position as the place to go for
> > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > processed by Hive, so there are other parties using the metastore who
> > justifiably would like to influence its direction. Opening up the metastore
> > will help retain and extend this position.
> >
> > Julian
> >
> >
> > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > >
> > >
> > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > A few of us have been talking and come to the conclussion that it
> > would be>
> > > > a good thing to split out the Hive metastore into its own Apache
> > project.>
> > > > Below and in the linked wiki page we explain what we see as the
> > advantages>
> > > > to this and how we would go about it.>
> > > > >
> > > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > Some,>
> > > > like Impala and Presto can use it as their own metadata system with
> > the>
> > > > rest of Hive not present.>
> > > > >
> > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > allows>
> > > > users to use the tool of their choice while still accessing the same
> > shared>
> > > > data.  But having this shared metadata inside the Hive project limits
> > the>
> > > > ability of other projects to contribute to the metastore.  It also
> > makes it>
> > > > harder for new systems that have similar but not identical metadata>
> > > > requirements (for example, stream processing systems on top of Apache>
> > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > comes>
> > > > out in two ways.  One, it is hard for non-Hive community members to>
> > > > participate in the project.  Second, it adds operational cost since
> > users>
> > > > are forced to deploy all of the Hive jars just to get the metastore to
> > work.>
> > > > >
> > > > Therefore we propose to split Hive’s metastore out into a separate
> > Apache>
> > > > project.  This new project will continue to support the same Thrift
> > API as>
> > > > the current metastore.  It will continue to focus on being a high>
> > > > performance, fault tolerant, large scale, operational metastore for
> > SQL>
> > > > engines and other systems that want to store schema information about
> > their>
> > > > data.>
> > > > >
> > > > By making it a separate project we will enable other projects to join
> > us in>
> > > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > > users that want to use the metastore as they will no longer need to
> > install>
> > > > Hive just to get the metastore.  And it will attract new projects that>
> > > > might otherwise feel the need to solve their metadata problems on
> > their own.>
> > > > >
> > > > Any Hive PMC member or committer will be welcome to join the new
> > project at>
> > > > the same level.  We propose this project go straight to a top level>
> > > > project.  Given that the initial PMC will be formed from experienced
> > Hive>
> > > > PMC members we do not believe incubation will be necessary.  (Note
> > that the>
> > > > Apache board will need to approve this.)>
> > > > >
> > > > Obviously there a many details involved in a proposal like this.
> > Rather>
> > > > than make this a ten page email we have filled out many of the details
> > in a>
> > > > wiki page:>
> > > > https://cwiki.apache.org/confluence/display/Hive/
> > Metastore+TLP+Proposal>
> > > > >
> > 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Sergio Pena
Great, thanks Alan for putting all this in the email.
+1

Allowing other components to continue to use the Metastore without the need
to use Hive dependencies is a big plus for them. I agree with everything
you mention on the email.

- Sergio

On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:

> +1
>
> As a Calcite PMC member, I am very pleased to see this change. Calcite
> reads metadata from a variety of sources (including JDBC databases, NoSQL
> databases such as Cassandra and Druid, and streaming systems), and if more
> of those sources choose to store their metadata in the metastore it will
> make our lives easier.
>
> Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by Hive, so there are other parties using the metastore who
> justifiably would like to influence its direction. Opening up the metastore
> will help retain and extend this position.
>
> Julian
>
>
> On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> >
> >
> > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > A few of us have been talking and come to the conclussion that it
> would be>
> > > a good thing to split out the Hive metastore into its own Apache
> project.>
> > > Below and in the linked wiki page we explain what we see as the
> advantages>
> > > to this and how we would go about it.>
> > > >
> > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> Some,>
> > > like Impala and Presto can use it as their own metadata system with
> the>
> > > rest of Hive not present.>
> > > >
> > > This sharing is excellent for the ecosystem.  Together with HDFS it
> allows>
> > > users to use the tool of their choice while still accessing the same
> shared>
> > > data.  But having this shared metadata inside the Hive project limits
> the>
> > > ability of other projects to contribute to the metastore.  It also
> makes it>
> > > harder for new systems that have similar but not identical metadata>
> > > requirements (for example, stream processing systems on top of Apache>
> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> comes>
> > > out in two ways.  One, it is hard for non-Hive community members to>
> > > participate in the project.  Second, it adds operational cost since
> users>
> > > are forced to deploy all of the Hive jars just to get the metastore to
> work.>
> > > >
> > > Therefore we propose to split Hive’s metastore out into a separate
> Apache>
> > > project.  This new project will continue to support the same Thrift
> API as>
> > > the current metastore.  It will continue to focus on being a high>
> > > performance, fault tolerant, large scale, operational metastore for
> SQL>
> > > engines and other systems that want to store schema information about
> their>
> > > data.>
> > > >
> > > By making it a separate project we will enable other projects to join
> us in>
> > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > users that want to use the metastore as they will no longer need to
> install>
> > > Hive just to get the metastore.  And it will attract new projects that>
> > > might otherwise feel the need to solve their metadata problems on
> their own.>
> > > >
> > > Any Hive PMC member or committer will be welcome to join the new
> project at>
> > > the same level.  We propose this project go straight to a top level>
> > > project.  Given that the initial PMC will be formed from experienced
> Hive>
> > > PMC members we do not believe incubation will be necessary.  (Note
> that the>
> > > Apache board will need to approve this.)>
> > > >
> > > Obviously there a many details involved in a proposal like this.
> Rather>
> > > than make this a ten page email we have filled out many of the details
> in a>
> > > wiki page:>
> > > https://cwiki.apache.org/confluence/display/Hive/
> Metastore+TLP+Proposal>
> > > >
> > > Yongzhi Chen>
> > > Vihang Karajgaonkar>
> > > Sergio Pena>
> > > Sahil Takiar>
> > > Aihua Xu>
> > > Gunther Hagleitner>
> > > Thejas Nair>
> > > Alan Gates>
> > > >
> >
> > +1 (from Apache Impala's (incubating) perspective)>
> >
> > Dimitris>
> >
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Julian Hyde
+1

As a Calcite PMC member, I am very pleased to see this change. Calcite reads 
metadata from a variety of sources (including JDBC databases, NoSQL databases 
such as Cassandra and Druid, and streaming systems), and if more of those 
sources choose to store their metadata in the metastore it will make our lives 
easier.

Hive’s metastore has established a position as the place to go for metadata in 
the Hadoop ecosystem. Not all metadata is relational, or processed by Hive, so 
there are other parties using the metastore who justifiably would like to 
influence its direction. Opening up the metastore will help retain and extend 
this position.

Julian


On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote: 
> 
> 
> On 2017-06-30 07:56 (-0700), Alan Gates  wrote: > 
> > A few of us have been talking and come to the conclussion that it would be> 
> > a good thing to split out the Hive metastore into its own Apache project.> 
> > Below and in the linked wiki page we explain what we see as the advantages> 
> > to this and how we would go about it.> 
> > > 
> > Hive’s metastore has long been used by other projects in the Hadoop> 
> > ecosystem to store and access metadata.  Apache Impala, Apache Spark,> 
> > Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,> 
> > like Impala and Presto can use it as their own metadata system with the> 
> > rest of Hive not present.> 
> > > 
> > This sharing is excellent for the ecosystem.  Together with HDFS it allows> 
> > users to use the tool of their choice while still accessing the same 
> > shared> 
> > data.  But having this shared metadata inside the Hive project limits the> 
> > ability of other projects to contribute to the metastore.  It also makes 
> > it> 
> > harder for new systems that have similar but not identical metadata> 
> > requirements (for example, stream processing systems on top of Apache> 
> > Kafka) to use Hive’s metastore.  This difficulty for other systems comes> 
> > out in two ways.  One, it is hard for non-Hive community members to> 
> > participate in the project.  Second, it adds operational cost since users> 
> > are forced to deploy all of the Hive jars just to get the metastore to 
> > work.> 
> > > 
> > Therefore we propose to split Hive’s metastore out into a separate Apache> 
> > project.  This new project will continue to support the same Thrift API as> 
> > the current metastore.  It will continue to focus on being a high> 
> > performance, fault tolerant, large scale, operational metastore for SQL> 
> > engines and other systems that want to store schema information about 
> > their> 
> > data.> 
> > > 
> > By making it a separate project we will enable other projects to join us 
> > in> 
> > innovating on the metastore.  It will simplify operations for non-Hive> 
> > users that want to use the metastore as they will no longer need to 
> > install> 
> > Hive just to get the metastore.  And it will attract new projects that> 
> > might otherwise feel the need to solve their metadata problems on their 
> > own.> 
> > > 
> > Any Hive PMC member or committer will be welcome to join the new project 
> > at> 
> > the same level.  We propose this project go straight to a top level> 
> > project.  Given that the initial PMC will be formed from experienced Hive> 
> > PMC members we do not believe incubation will be necessary.  (Note that 
> > the> 
> > Apache board will need to approve this.)> 
> > > 
> > Obviously there a many details involved in a proposal like this.  Rather> 
> > than make this a ten page email we have filled out many of the details in 
> > a> 
> > wiki page:> 
> > https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal> 
> > > 
> > Yongzhi Chen> 
> > Vihang Karajgaonkar> 
> > Sergio Pena> 
> > Sahil Takiar> 
> > Aihua Xu> 
> > Gunther Hagleitner> 
> > Thejas Nair> 
> > Alan Gates> 
> > > 
> 
> +1 (from Apache Impala's (incubating) perspective)> 
> 
> Dimitris> 
> 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Dimitris Tsirogiannis


On 2017-06-30 07:56 (-0700), Alan Gates  wrote: 
> A few of us have been talking and come to the conclussion that it would be
> a good thing to split out the Hive metastore into its own Apache project.
> Below and in the linked wiki page we explain what we see as the advantages
> to this and how we would go about it.
> 
> Hive’s metastore has long been used by other projects in the Hadoop
> ecosystem to store and access metadata.  Apache Impala, Apache Spark,
> Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,
> like Impala and Presto can use it as their own metadata system with the
> rest of Hive not present.
> 
> This sharing is excellent for the ecosystem.  Together with HDFS it allows
> users to use the tool of their choice while still accessing the same shared
> data.  But having this shared metadata inside the Hive project limits the
> ability of other projects to contribute to the metastore.  It also makes it
> harder for new systems that have similar but not identical metadata
> requirements (for example, stream processing systems on top of Apache
> Kafka) to use Hive’s metastore.  This difficulty for other systems comes
> out in two ways.  One, it is hard for non-Hive community members to
> participate in the project.  Second, it adds operational cost since users
> are forced to deploy all of the Hive jars just to get the metastore to work.
> 
> Therefore we propose to split Hive’s metastore out into a separate Apache
> project.  This new project will continue to support the same Thrift API as
> the current metastore.  It will continue to focus on being a high
> performance, fault tolerant, large scale, operational metastore for SQL
> engines and other systems that want to store schema information about their
> data.
> 
> By making it a separate project we will enable other projects to join us in
> innovating on the metastore.  It will simplify operations for non-Hive
> users that want to use the metastore as they will no longer need to install
> Hive just to get the metastore.  And it will attract new projects that
> might otherwise feel the need to solve their metadata problems on their own.
> 
> Any Hive PMC member or committer will be welcome to join the new project at
> the same level.  We propose this project go straight to a top level
> project.  Given that the initial PMC will be formed from experienced Hive
> PMC members we do not believe incubation will be necessary.  (Note that the
> Apache board will need to approve this.)
> 
> Obviously there a many details involved in a proposal like this.  Rather
> than make this a ten page email we have filled out many of the details in a
> wiki page:
> https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal
> 
> Yongzhi Chen
> Vihang Karajgaonkar
> Sergio Pena
> Sahil Takiar
> Aihua Xu
> Gunther Hagleitner
> Thejas Nair
> Alan Gates
> 

+1 (from Apache Impala's (incubating) perspective)

Dimitris


[DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Alan Gates
A few of us have been talking and come to the conclussion that it would be
a good thing to split out the Hive metastore into its own Apache project.
Below and in the linked wiki page we explain what we see as the advantages
to this and how we would go about it.

Hive’s metastore has long been used by other projects in the Hadoop
ecosystem to store and access metadata.  Apache Impala, Apache Spark,
Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,
like Impala and Presto can use it as their own metadata system with the
rest of Hive not present.

This sharing is excellent for the ecosystem.  Together with HDFS it allows
users to use the tool of their choice while still accessing the same shared
data.  But having this shared metadata inside the Hive project limits the
ability of other projects to contribute to the metastore.  It also makes it
harder for new systems that have similar but not identical metadata
requirements (for example, stream processing systems on top of Apache
Kafka) to use Hive’s metastore.  This difficulty for other systems comes
out in two ways.  One, it is hard for non-Hive community members to
participate in the project.  Second, it adds operational cost since users
are forced to deploy all of the Hive jars just to get the metastore to work.

Therefore we propose to split Hive’s metastore out into a separate Apache
project.  This new project will continue to support the same Thrift API as
the current metastore.  It will continue to focus on being a high
performance, fault tolerant, large scale, operational metastore for SQL
engines and other systems that want to store schema information about their
data.

By making it a separate project we will enable other projects to join us in
innovating on the metastore.  It will simplify operations for non-Hive
users that want to use the metastore as they will no longer need to install
Hive just to get the metastore.  And it will attract new projects that
might otherwise feel the need to solve their metadata problems on their own.

Any Hive PMC member or committer will be welcome to join the new project at
the same level.  We propose this project go straight to a top level
project.  Given that the initial PMC will be formed from experienced Hive
PMC members we do not believe incubation will be necessary.  (Note that the
Apache board will need to approve this.)

Obviously there a many details involved in a proposal like this.  Rather
than make this a ten page email we have filled out many of the details in a
wiki page:
https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal

Yongzhi Chen
Vihang Karajgaonkar
Sergio Pena
Sahil Takiar
Aihua Xu
Gunther Hagleitner
Thejas Nair
Alan Gates