Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-22 Thread Dhruba Borthakur
I am definitely against moving Hive out of Hadoop. There is appreciable
representation of Hive inside the Hadoop PMC and, as far as I can say, there
is no additional burden on the Hadooo PMC to make Hive remain inside Hadoop.

I respect Jeff/Amr's comments on their viewpoints, but I beg to differ from
that. I really do not see any benefit on moving Hive out of Hadoop.

thanks,
dhruba

On Thu, Apr 22, 2010 at 10:09 AM, Ashish Thusoo wrote:

> What is the advantage of becoming a TLP to the project itself? I have heard
> that it is something that apache wants, but considering that we are very
> comfortable on how Hive interacts with the Hadoop ecosystem as a sub project
> for Hadoop, there has to be some big incentive for the project to be a TLP
> and nowhere have a seen how this would benefit Hive. Any thoughts on that?
>
> Ashish
>
> 
> From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
> Sent: Wednesday, April 21, 2010 7:35 PM
> To: hive-dev@hadoop.apache.org
> Cc: Ashish Thusoo
> Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question
>
> Hive already does the work to run on multiple versions of Hadoop, and the
> release cycle is independent of Hadoop's. I don't see why it should remain a
> subproject. I'm +1 on Hive becoming a TLP.
>
> On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao  zsh...@gmail.com>> wrote:
> As a Hive committer, I don't feel the benefit we get from becoming a
> TLP is big enough (compared with the cost) to make Hive a TLP.
> From Chris's comment I see that the cost is not that big, but I still
> wonder what benefit we will get from that.
>
> Also I didn't get the idea of the joke ("In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
> any reasons that applies to Pig but not Hive.
> We should continue the discussion here, but anything in the Pig's
> discussion should also be considered here.
>
> Zheng
>
> On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah  a...@cloudera.com>> wrote:
> > I am personally +1 on Hive being a TLP, I think it did reach the
> community
> > adoption and maturity level required for that. In fact, one could argue
> that
> > Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
> >
> > The real question to ask is whether there is a volunteer to take care of
> the
> > "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> > volunteer if no body else up to the task, but I am not a committer and
> only
> > contributed a minor patch for bash/cygwin).
> >
> > BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> > tradeoffs. I happen to agree with all he says, and frankly I couldn't
> have
> > wrote it better my self. I highlight certain parts from his message, but
> I
> > recommend you read the whole thing.
> >
> > -- Forwarded message --
> > From: Chris Douglas mailto:cdoug...@apache.org>>
> > Date: Tue, Apr 13, 2010 at 11:46 PM
> > Subject: Subprojects and TLP status
> > To: gene...@hadoop.apache.org<mailto:gene...@hadoop.apache.org>,
> priv...@hadoop.apache.org<mailto:priv...@hadoop.apache.org>
> >
> > Most of Hadoop's subprojects have discussed becoming top-level Apache
> > projects (TLPs) in the last few weeks. Most have expressed a desire to
> > remain in Hadoop. The salient parts of the discussions I've read tend
> > to focus on three aspects: a technical dependence on Hadoop,
> > additional overhead as a TLP, and visibility both within the Hadoop
> > ecosystem and in the open source community generally.
> >
> > Life as a TLP: this is not much harder than being a Hadoop subproject,
> > and the Apache preferences being tossed around- particularly
> > "insufficiently diverse"- are not blockers. Every subproject needs to
> > write a section of the report Hadoop sends to the board; almost the
> > same report, sent to a new address. The initial cost is similarly
> > light: copy bylaws, send a few notes to INFRA, and follow some
> > directions. I think the estimated costs are far higher than they will
> > be in practice. Inertia is a powerful force, but it should be
> > overcome. The directions are here, and should not intimidating:
> >
> > http://apache.org/dev/project-creation.html
> >
> > Visibility: the Hadoop site does not need to change. For each
> > subproject, we can literally change the hyperlinks to point to the new
> > page and be done. Long-term, linking to all ASF projects that run on
> > Hadoo

RE: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-22 Thread Ashish Thusoo
What is the advantage of becoming a TLP to the project itself? I have heard 
that it is something that apache wants, but considering that we are very 
comfortable on how Hive interacts with the Hadoop ecosystem as a sub project 
for Hadoop, there has to be some big incentive for the project to be a TLP and 
nowhere have a seen how this would benefit Hive. Any thoughts on that?

Ashish


From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Wednesday, April 21, 2010 7:35 PM
To: hive-dev@hadoop.apache.org
Cc: Ashish Thusoo
Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

Hive already does the work to run on multiple versions of Hadoop, and the 
release cycle is independent of Hadoop's. I don't see why it should remain a 
subproject. I'm +1 on Hive becoming a TLP.

On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao 
mailto:zsh...@gmail.com>> wrote:
As a Hive committer, I don't feel the benefit we get from becoming a
TLP is big enough (compared with the cost) to make Hive a TLP.
>From Chris's comment I see that the cost is not that big, but I still
wonder what benefit we will get from that.

Also I didn't get the idea of the joke ("In fact, one could argue that
Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
any reasons that applies to Pig but not Hive.
We should continue the discussion here, but anything in the Pig's
discussion should also be considered here.

Zheng

On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah 
mailto:a...@cloudera.com>> wrote:
> I am personally +1 on Hive being a TLP, I think it did reach the community
> adoption and maturity level required for that. In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
>
> The real question to ask is whether there is a volunteer to take care of the
> "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> volunteer if no body else up to the task, but I am not a committer and only
> contributed a minor patch for bash/cygwin).
>
> BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> tradeoffs. I happen to agree with all he says, and frankly I couldn't have
> wrote it better my self. I highlight certain parts from his message, but I
> recommend you read the whole thing.
>
> -- Forwarded message --
> From: Chris Douglas mailto:cdoug...@apache.org>>
> Date: Tue, Apr 13, 2010 at 11:46 PM
> Subject: Subprojects and TLP status
> To: gene...@hadoop.apache.org<mailto:gene...@hadoop.apache.org>, 
> priv...@hadoop.apache.org<mailto:priv...@hadoop.apache.org>
>
> Most of Hadoop's subprojects have discussed becoming top-level Apache
> projects (TLPs) in the last few weeks. Most have expressed a desire to
> remain in Hadoop. The salient parts of the discussions I've read tend
> to focus on three aspects: a technical dependence on Hadoop,
> additional overhead as a TLP, and visibility both within the Hadoop
> ecosystem and in the open source community generally.
>
> Life as a TLP: this is not much harder than being a Hadoop subproject,
> and the Apache preferences being tossed around- particularly
> "insufficiently diverse"- are not blockers. Every subproject needs to
> write a section of the report Hadoop sends to the board; almost the
> same report, sent to a new address. The initial cost is similarly
> light: copy bylaws, send a few notes to INFRA, and follow some
> directions. I think the estimated costs are far higher than they will
> be in practice. Inertia is a powerful force, but it should be
> overcome. The directions are here, and should not intimidating:
>
> http://apache.org/dev/project-creation.html
>
> Visibility: the Hadoop site does not need to change. For each
> subproject, we can literally change the hyperlinks to point to the new
> page and be done. Long-term, linking to all ASF projects that run on
> Hadoop from a prominent page is something we all want. So particularly
> in the medium-term that most are considering: visibility through the
> website will not change. Each subproject will still be linked from the
> front page.
>
> Hadoop would not be nearly as popular as it is without Zookeeper,
> HBase, Hive, and Pig. All statistics on work in shared MapReduce
> clusters show that users vastly prefer running Pig and Hive queries to
> writing MapReduce jobs. HBase continues to push features in HDFS that
> increase its adoption and relevance outside MapReduce, while sharing
> some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> workloads, but many proposals for future features require it. The
> bottom line is that MapReduce and HDFS need these projects for
> visibilit

Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-22 Thread Edward Capriolo
On Wed, Apr 21, 2010 at 10:35 PM, Jeff Hammerbacher wrote:

> Hive already does the work to run on multiple versions of Hadoop, and the
> release cycle is independent of Hadoop's. I don't see why it should remain
> a
> subproject. I'm +1 on Hive becoming a TLP.
>
> On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao  wrote:
>
> > As a Hive committer, I don't feel the benefit we get from becoming a
> > TLP is big enough (compared with the cost) to make Hive a TLP.
> > From Chris's comment I see that the cost is not that big, but I still
> > wonder what benefit we will get from that.
> >
> > Also I didn't get the idea of the joke ("In fact, one could argue that
> > Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
> > any reasons that applies to Pig but not Hive.
> > We should continue the discussion here, but anything in the Pig's
> > discussion should also be considered here.
> >
> > Zheng
> >
> > On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah  wrote:
> > > I am personally +1 on Hive being a TLP, I think it did reach the
> > community
> > > adoption and maturity level required for that. In fact, one could argue
> > that
> > > Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
> > >
> > > The real question to ask is whether there is a volunteer to take care
> of
> > the
> > > "administrative" tasks, which isn't a ton of work afaiu (I am willing
> to
> > > volunteer if no body else up to the task, but I am not a committer and
> > only
> > > contributed a minor patch for bash/cygwin).
> > >
> > > BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> > > tradeoffs. I happen to agree with all he says, and frankly I couldn't
> > have
> > > wrote it better my self. I highlight certain parts from his message,
> but
> > I
> > > recommend you read the whole thing.
> > >
> > > -- Forwarded message --
> > > From: Chris Douglas 
> > > Date: Tue, Apr 13, 2010 at 11:46 PM
> > > Subject: Subprojects and TLP status
> > > To: gene...@hadoop.apache.org, priv...@hadoop.apache.org
> > >
> > > Most of Hadoop's subprojects have discussed becoming top-level Apache
> > > projects (TLPs) in the last few weeks. Most have expressed a desire to
> > > remain in Hadoop. The salient parts of the discussions I've read tend
> > > to focus on three aspects: a technical dependence on Hadoop,
> > > additional overhead as a TLP, and visibility both within the Hadoop
> > > ecosystem and in the open source community generally.
> > >
> > > Life as a TLP: this is not much harder than being a Hadoop subproject,
> > > and the Apache preferences being tossed around- particularly
> > > "insufficiently diverse"- are not blockers. Every subproject needs to
> > > write a section of the report Hadoop sends to the board; almost the
> > > same report, sent to a new address. The initial cost is similarly
> > > light: copy bylaws, send a few notes to INFRA, and follow some
> > > directions. I think the estimated costs are far higher than they will
> > > be in practice. Inertia is a powerful force, but it should be
> > > overcome. The directions are here, and should not intimidating:
> > >
> > > http://apache.org/dev/project-creation.html
> > >
> > > Visibility: the Hadoop site does not need to change. For each
> > > subproject, we can literally change the hyperlinks to point to the new
> > > page and be done. Long-term, linking to all ASF projects that run on
> > > Hadoop from a prominent page is something we all want. So particularly
> > > in the medium-term that most are considering: visibility through the
> > > website will not change. Each subproject will still be linked from the
> > > front page.
> > >
> > > Hadoop would not be nearly as popular as it is without Zookeeper,
> > > HBase, Hive, and Pig. All statistics on work in shared MapReduce
> > > clusters show that users vastly prefer running Pig and Hive queries to
> > > writing MapReduce jobs. HBase continues to push features in HDFS that
> > > increase its adoption and relevance outside MapReduce, while sharing
> > > some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> > > workloads, but many proposals for future features require it. The
> > > bottom line is that MapReduce and HDFS need these projects for
> > > visibility and adoption in precisely the same way. I don't think
> > > separate TLPs will uncouple the broader community from one another.
> > >
> > > Technical dependence: this has two dimensions. First, influencing
> > > MapReduce and HDFS. This is nonsense. Earning influence by
> > > contributing to a subproject is the only way to push code changes;
> > > nobody from any of these projects has violated that by unilaterally
> > > committing to HDFS or MapReduce, anyway. And anyone cynical enough to
> > > believe that MapReduce and HDFS would deliberately screw over or
> > > ignore dependent projects because they don't have PMC members is
> > > plainly unsuited to community-driven development. I understand that
> > > t

Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-21 Thread Jeff Hammerbacher
Hive already does the work to run on multiple versions of Hadoop, and the
release cycle is independent of Hadoop's. I don't see why it should remain a
subproject. I'm +1 on Hive becoming a TLP.

On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao  wrote:

> As a Hive committer, I don't feel the benefit we get from becoming a
> TLP is big enough (compared with the cost) to make Hive a TLP.
> From Chris's comment I see that the cost is not that big, but I still
> wonder what benefit we will get from that.
>
> Also I didn't get the idea of the joke ("In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
> any reasons that applies to Pig but not Hive.
> We should continue the discussion here, but anything in the Pig's
> discussion should also be considered here.
>
> Zheng
>
> On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah  wrote:
> > I am personally +1 on Hive being a TLP, I think it did reach the
> community
> > adoption and maturity level required for that. In fact, one could argue
> that
> > Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
> >
> > The real question to ask is whether there is a volunteer to take care of
> the
> > "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> > volunteer if no body else up to the task, but I am not a committer and
> only
> > contributed a minor patch for bash/cygwin).
> >
> > BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> > tradeoffs. I happen to agree with all he says, and frankly I couldn't
> have
> > wrote it better my self. I highlight certain parts from his message, but
> I
> > recommend you read the whole thing.
> >
> > -- Forwarded message --
> > From: Chris Douglas 
> > Date: Tue, Apr 13, 2010 at 11:46 PM
> > Subject: Subprojects and TLP status
> > To: gene...@hadoop.apache.org, priv...@hadoop.apache.org
> >
> > Most of Hadoop's subprojects have discussed becoming top-level Apache
> > projects (TLPs) in the last few weeks. Most have expressed a desire to
> > remain in Hadoop. The salient parts of the discussions I've read tend
> > to focus on three aspects: a technical dependence on Hadoop,
> > additional overhead as a TLP, and visibility both within the Hadoop
> > ecosystem and in the open source community generally.
> >
> > Life as a TLP: this is not much harder than being a Hadoop subproject,
> > and the Apache preferences being tossed around- particularly
> > "insufficiently diverse"- are not blockers. Every subproject needs to
> > write a section of the report Hadoop sends to the board; almost the
> > same report, sent to a new address. The initial cost is similarly
> > light: copy bylaws, send a few notes to INFRA, and follow some
> > directions. I think the estimated costs are far higher than they will
> > be in practice. Inertia is a powerful force, but it should be
> > overcome. The directions are here, and should not intimidating:
> >
> > http://apache.org/dev/project-creation.html
> >
> > Visibility: the Hadoop site does not need to change. For each
> > subproject, we can literally change the hyperlinks to point to the new
> > page and be done. Long-term, linking to all ASF projects that run on
> > Hadoop from a prominent page is something we all want. So particularly
> > in the medium-term that most are considering: visibility through the
> > website will not change. Each subproject will still be linked from the
> > front page.
> >
> > Hadoop would not be nearly as popular as it is without Zookeeper,
> > HBase, Hive, and Pig. All statistics on work in shared MapReduce
> > clusters show that users vastly prefer running Pig and Hive queries to
> > writing MapReduce jobs. HBase continues to push features in HDFS that
> > increase its adoption and relevance outside MapReduce, while sharing
> > some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> > workloads, but many proposals for future features require it. The
> > bottom line is that MapReduce and HDFS need these projects for
> > visibility and adoption in precisely the same way. I don't think
> > separate TLPs will uncouple the broader community from one another.
> >
> > Technical dependence: this has two dimensions. First, influencing
> > MapReduce and HDFS. This is nonsense. Earning influence by
> > contributing to a subproject is the only way to push code changes;
> > nobody from any of these projects has violated that by unilaterally
> > committing to HDFS or MapReduce, anyway. And anyone cynical enough to
> > believe that MapReduce and HDFS would deliberately screw over or
> > ignore dependent projects because they don't have PMC members is
> > plainly unsuited to community-driven development. I understand that
> > these projects need to protect their users, but lobbying rights are
> > not an actual benefit.
> >
> > Second, being a coherent part of the Hadoop ecosystem. It is (mostly)
> > true that Hadoop currently offers a set of mutually compatible
> > framework

Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-20 Thread Zheng Shao
As a Hive committer, I don't feel the benefit we get from becoming a
TLP is big enough (compared with the cost) to make Hive a TLP.
>From Chris's comment I see that the cost is not that big, but I still
wonder what benefit we will get from that.

Also I didn't get the idea of the joke ("In fact, one could argue that
Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
any reasons that applies to Pig but not Hive.
We should continue the discussion here, but anything in the Pig's
discussion should also be considered here.

Zheng

On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah  wrote:
> I am personally +1 on Hive being a TLP, I think it did reach the community
> adoption and maturity level required for that. In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
>
> The real question to ask is whether there is a volunteer to take care of the
> "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> volunteer if no body else up to the task, but I am not a committer and only
> contributed a minor patch for bash/cygwin).
>
> BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> tradeoffs. I happen to agree with all he says, and frankly I couldn't have
> wrote it better my self. I highlight certain parts from his message, but I
> recommend you read the whole thing.
>
> -- Forwarded message --
> From: Chris Douglas 
> Date: Tue, Apr 13, 2010 at 11:46 PM
> Subject: Subprojects and TLP status
> To: gene...@hadoop.apache.org, priv...@hadoop.apache.org
>
> Most of Hadoop's subprojects have discussed becoming top-level Apache
> projects (TLPs) in the last few weeks. Most have expressed a desire to
> remain in Hadoop. The salient parts of the discussions I've read tend
> to focus on three aspects: a technical dependence on Hadoop,
> additional overhead as a TLP, and visibility both within the Hadoop
> ecosystem and in the open source community generally.
>
> Life as a TLP: this is not much harder than being a Hadoop subproject,
> and the Apache preferences being tossed around- particularly
> "insufficiently diverse"- are not blockers. Every subproject needs to
> write a section of the report Hadoop sends to the board; almost the
> same report, sent to a new address. The initial cost is similarly
> light: copy bylaws, send a few notes to INFRA, and follow some
> directions. I think the estimated costs are far higher than they will
> be in practice. Inertia is a powerful force, but it should be
> overcome. The directions are here, and should not intimidating:
>
> http://apache.org/dev/project-creation.html
>
> Visibility: the Hadoop site does not need to change. For each
> subproject, we can literally change the hyperlinks to point to the new
> page and be done. Long-term, linking to all ASF projects that run on
> Hadoop from a prominent page is something we all want. So particularly
> in the medium-term that most are considering: visibility through the
> website will not change. Each subproject will still be linked from the
> front page.
>
> Hadoop would not be nearly as popular as it is without Zookeeper,
> HBase, Hive, and Pig. All statistics on work in shared MapReduce
> clusters show that users vastly prefer running Pig and Hive queries to
> writing MapReduce jobs. HBase continues to push features in HDFS that
> increase its adoption and relevance outside MapReduce, while sharing
> some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> workloads, but many proposals for future features require it. The
> bottom line is that MapReduce and HDFS need these projects for
> visibility and adoption in precisely the same way. I don't think
> separate TLPs will uncouple the broader community from one another.
>
> Technical dependence: this has two dimensions. First, influencing
> MapReduce and HDFS. This is nonsense. Earning influence by
> contributing to a subproject is the only way to push code changes;
> nobody from any of these projects has violated that by unilaterally
> committing to HDFS or MapReduce, anyway. And anyone cynical enough to
> believe that MapReduce and HDFS would deliberately screw over or
> ignore dependent projects because they don't have PMC members is
> plainly unsuited to community-driven development. I understand that
> these projects need to protect their users, but lobbying rights are
> not an actual benefit.
>
> Second, being a coherent part of the Hadoop ecosystem. It is (mostly)
> true that Hadoop currently offers a set of mutually compatible
> frameworks. It is not true that moving them to separate Apache
> projects would make solutions less coherent or affect existing or
> future users at all. The cohesion between projects' governance is
> sufficiently weak to justify independent units, but the real
> dependencies between the projects are strong enough to keep us engaged
> with one another. And it's not as if other projects- Cascading, for
> example- aren't also organisms

Re: [DISCUSSION] To be (or not to be) a TLP - that is the question

2010-04-19 Thread Amr Awadallah
I am personally +1 on Hive being a TLP, I think it did reach the 
community adoption and maturity level required for that. In fact, one 
could argue that Pig opting not to be TLP yet is why Hive should go TLP 
:) (jk).


The real question to ask is whether there is a volunteer to take care of 
the "administrative" tasks, which isn't a ton of work afaiu (I am 
willing to volunteer if no body else up to the task, but I am not a 
committer and only contributed a minor patch for bash/cygwin).


BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP 
tradeoffs. I happen to agree with all he says, and frankly I couldn't 
have wrote it better my self. I highlight certain parts from his 
message, but I recommend you read the whole thing.


-- Forwarded message --
From: Chris Douglas 
Date: Tue, Apr 13, 2010 at 11:46 PM
Subject: Subprojects and TLP status
To: gene...@hadoop.apache.org, priv...@hadoop.apache.org

Most of Hadoop's subprojects have discussed becoming top-level Apache
projects (TLPs) in the last few weeks. Most have expressed a desire to
remain in Hadoop. The salient parts of the discussions I've read tend
to focus on three aspects: a technical dependence on Hadoop,
additional overhead as a TLP, and visibility both within the Hadoop
ecosystem and in the open source community generally.

Life as a TLP: this is not much harder than being a Hadoop subproject,
and the Apache preferences being tossed around- particularly
"insufficiently diverse"- are not blockers. Every subproject needs to
write a section of the report Hadoop sends to the board; almost the
same report, sent to a new address. The initial cost is similarly
light: copy bylaws, send a few notes to INFRA, and follow some
directions. I think the estimated costs are far higher than they will
be in practice. Inertia is a powerful force, but it should be
overcome. The directions are here, and should not intimidating:

http://apache.org/dev/project-creation.html

Visibility: the Hadoop site does not need to change. For each
subproject, we can literally change the hyperlinks to point to the new
page and be done. Long-term, linking to all ASF projects that run on
Hadoop from a prominent page is something we all want. So particularly
in the medium-term that most are considering: visibility through the
website will not change. Each subproject will still be linked from the
front page.

Hadoop would not be nearly as popular as it is without Zookeeper,
HBase, Hive, and Pig. All statistics on work in shared MapReduce
clusters show that users vastly prefer running Pig and Hive queries to
writing MapReduce jobs. HBase continues to push features in HDFS that
increase its adoption and relevance outside MapReduce, while sharing
some of its NoSQL limelight. Zookeeper is not only a linchpin in real
workloads, but many proposals for future features require it. The
bottom line is that MapReduce and HDFS need these projects for
visibility and adoption in precisely the same way. I don't think
separate TLPs will uncouple the broader community from one another.

Technical dependence: this has two dimensions. First, influencing
MapReduce and HDFS. This is nonsense. Earning influence by
contributing to a subproject is the only way to push code changes;
nobody from any of these projects has violated that by unilaterally
committing to HDFS or MapReduce, anyway. And anyone cynical enough to
believe that MapReduce and HDFS would deliberately screw over or
ignore dependent projects because they don't have PMC members is
plainly unsuited to community-driven development. I understand that
these projects need to protect their users, but lobbying rights are
not an actual benefit.

Second, being a coherent part of the Hadoop ecosystem. It is (mostly)
true that Hadoop currently offers a set of mutually compatible
frameworks. It is not true that moving them to separate Apache
projects would make solutions less coherent or affect existing or
future users at all. The cohesion between projects' governance is
sufficiently weak to justify independent units, but the real
dependencies between the projects are strong enough to keep us engaged
with one another. And it's not as if other projects- Cascading, for
example- aren't also organisms adapted and specialized for life in
Hadoop.

Arguments on technical dependence are ignoring the nature of the
existing interactions. Besides, weak technical dependencies are not a
necessary prerequisite for a subproject's independence.

As for what was *not* said in these discussions, there is no argument
that every one of these subprojects has a distinct, autonomous
community. There was also no argument that the Hadoop PMC offers any
valuable oversight, given that the representatives of its fiefdoms are
too consumed by provincial matters to participate in neighboring
governance. Most releases I've voted on: I run the unit tests, check
the signature, verify the checksum, and know literally nothing else
about its content. I have o