I am definitely against moving Hive out of Hadoop. There is appreciable
representation of Hive inside the Hadoop PMC and, as far as I can say, there
is no additional burden on the Hadooo PMC to make Hive remain inside Hadoop.

I respect Jeff/Amr's comments on their viewpoints, but I beg to differ from
that. I really do not see any benefit on moving Hive out of Hadoop.

thanks,
dhruba

On Thu, Apr 22, 2010 at 10:09 AM, Ashish Thusoo <athu...@facebook.com>wrote:

> What is the advantage of becoming a TLP to the project itself? I have heard
> that it is something that apache wants, but considering that we are very
> comfortable on how Hive interacts with the Hadoop ecosystem as a sub project
> for Hadoop, there has to be some big incentive for the project to be a TLP
> and nowhere have a seen how this would benefit Hive. Any thoughts on that?
>
> Ashish
>
> ________________________________
> From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
> Sent: Wednesday, April 21, 2010 7:35 PM
> To: hive-dev@hadoop.apache.org
> Cc: Ashish Thusoo
> Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question
>
> Hive already does the work to run on multiple versions of Hadoop, and the
> release cycle is independent of Hadoop's. I don't see why it should remain a
> subproject. I'm +1 on Hive becoming a TLP.
>
> On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao <zsh...@gmail.com<mailto:
> zsh...@gmail.com>> wrote:
> As a Hive committer, I don't feel the benefit we get from becoming a
> TLP is big enough (compared with the cost) to make Hive a TLP.
> From Chris's comment I see that the cost is not that big, but I still
> wonder what benefit we will get from that.
>
> Also I didn't get the idea of the joke ("In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
> any reasons that applies to Pig but not Hive.
> We should continue the discussion here, but anything in the Pig's
> discussion should also be considered here.
>
> Zheng
>
> On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah <a...@cloudera.com<mailto:
> a...@cloudera.com>> wrote:
> > I am personally +1 on Hive being a TLP, I think it did reach the
> community
> > adoption and maturity level required for that. In fact, one could argue
> that
> > Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
> >
> > The real question to ask is whether there is a volunteer to take care of
> the
> > "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> > volunteer if no body else up to the task, but I am not a committer and
> only
> > contributed a minor patch for bash/cygwin).
> >
> > BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> > tradeoffs. I happen to agree with all he says, and frankly I couldn't
> have
> > wrote it better my self. I highlight certain parts from his message, but
> I
> > recommend you read the whole thing.
> >
> > ---------- Forwarded message ----------
> > From: Chris Douglas <cdoug...@apache.org<mailto:cdoug...@apache.org>>
> > Date: Tue, Apr 13, 2010 at 11:46 PM
> > Subject: Subprojects and TLP status
> > To: gene...@hadoop.apache.org<mailto:gene...@hadoop.apache.org>,
> priv...@hadoop.apache.org<mailto:priv...@hadoop.apache.org>
> >
> > Most of Hadoop's subprojects have discussed becoming top-level Apache
> > projects (TLPs) in the last few weeks. Most have expressed a desire to
> > remain in Hadoop. The salient parts of the discussions I've read tend
> > to focus on three aspects: a technical dependence on Hadoop,
> > additional overhead as a TLP, and visibility both within the Hadoop
> > ecosystem and in the open source community generally.
> >
> > Life as a TLP: this is not much harder than being a Hadoop subproject,
> > and the Apache preferences being tossed around- particularly
> > "insufficiently diverse"- are not blockers. Every subproject needs to
> > write a section of the report Hadoop sends to the board; almost the
> > same report, sent to a new address. The initial cost is similarly
> > light: copy bylaws, send a few notes to INFRA, and follow some
> > directions. I think the estimated costs are far higher than they will
> > be in practice. Inertia is a powerful force, but it should be
> > overcome. The directions are here, and should not intimidating:
> >
> > http://apache.org/dev/project-creation.html
> >
> > Visibility: the Hadoop site does not need to change. For each
> > subproject, we can literally change the hyperlinks to point to the new
> > page and be done. Long-term, linking to all ASF projects that run on
> > Hadoop from a prominent page is something we all want. So particularly
> > in the medium-term that most are considering: visibility through the
> > website will not change. Each subproject will still be linked from the
> > front page.
> >
> > Hadoop would not be nearly as popular as it is without Zookeeper,
> > HBase, Hive, and Pig. All statistics on work in shared MapReduce
> > clusters show that users vastly prefer running Pig and Hive queries to
> > writing MapReduce jobs. HBase continues to push features in HDFS that
> > increase its adoption and relevance outside MapReduce, while sharing
> > some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> > workloads, but many proposals for future features require it. The
> > bottom line is that MapReduce and HDFS need these projects for
> > visibility and adoption in precisely the same way. I don't think
> > separate TLPs will uncouple the broader community from one another.
> >
> > Technical dependence: this has two dimensions. First, influencing
> > MapReduce and HDFS. This is nonsense. Earning influence by
> > contributing to a subproject is the only way to push code changes;
> > nobody from any of these projects has violated that by unilaterally
> > committing to HDFS or MapReduce, anyway. And anyone cynical enough to
> > believe that MapReduce and HDFS would deliberately screw over or
> > ignore dependent projects because they don't have PMC members is
> > plainly unsuited to community-driven development. I understand that
> > these projects need to protect their users, but lobbying rights are
> > not an actual benefit.
> >
> > Second, being a coherent part of the Hadoop ecosystem. It is (mostly)
> > true that Hadoop currently offers a set of mutually compatible
> > frameworks. It is not true that moving them to separate Apache
> > projects would make solutions less coherent or affect existing or
> > future users at all. The cohesion between projects' governance is
> > sufficiently weak to justify independent units, but the real
> > dependencies between the projects are strong enough to keep us engaged
> > with one another. And it's not as if other projects- Cascading, for
> > example- aren't also organisms adapted and specialized for life in
> > Hadoop.
> >
> > Arguments on technical dependence are ignoring the nature of the
> > existing interactions. Besides, weak technical dependencies are not a
> > necessary prerequisite for a subproject's independence.
> >
> > As for what was *not* said in these discussions, there is no argument
> > that every one of these subprojects has a distinct, autonomous
> > community. There was also no argument that the Hadoop PMC offers any
> > valuable oversight, given that the representatives of its fiefdoms are
> > too consumed by provincial matters to participate in neighboring
> > governance. Most releases I've voted on: I run the unit tests, check
> > the signature, verify the checksum, and know literally nothing else
> > about its content. I have often never heard the names of many proposed
> > committers and even some proposed PMC members. Right now, subprojects
> > with enough PMC members essentially vote out their own releases and
> > vote in their own committers: TLPs in all but name.
> >
> > The Hadoop club- in conferences, meetups, technical debates, etc.- is
> > broad, diverse, and intertwined, but communities of developers have
> > already clustered around subprojects. Allowing that each cluster
> > should govern itself is a dry, practical matter, not an existential
> > crisis. -C
> >
> > On 4/19/2010 11:57 AM, Ashish Thusoo wrote:
> >>
> >> Hi Folks,
> >>
> >> Recently Apache Board asked the Hadoop PMC if some sub projects can
> become
> >> top level projects. In the opinion of the board, big umbrella projects
> make
> >> it difficult to monitor the health of the communities within the sub
> >> projects. If Hive does become a TLP, then we would have to elect our own
> PMC
> >> and take on all the administrative tasks that the Hadoop PMC does for
> us. So
> >> there is definitely more administrative work involved as a TLP. So the
> >> question is whether we should take on this additional task keeping at
> this
> >> time and what tangible advantages and disadvantages would such a move
> entail
> >> for the project. Would like to hear what the community thinks on this
> issue.
> >>
> >> Thanks,
> >> Ashish
> >>
> >> PS: As some reference to what is happening in the other subprojects, at
> >> this time PIG and Zookeeper have decided NOT to become TLPs where as
> Hbase
> >> and Avro have decided to become TLPs.
> >>
> >>
> >
>
>
>
> --
> Yours,
> Zheng
> http://www.linkedin.com/in/zshao
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Reply via email to