I am definitely against moving Hive out of Hadoop. There is appreciable representation of Hive inside the Hadoop PMC and, as far as I can say, there is no additional burden on the Hadooo PMC to make Hive remain inside Hadoop.
I respect Jeff/Amr's comments on their viewpoints, but I beg to differ from that. I really do not see any benefit on moving Hive out of Hadoop. thanks, dhruba On Thu, Apr 22, 2010 at 10:09 AM, Ashish Thusoo <athu...@facebook.com>wrote: > What is the advantage of becoming a TLP to the project itself? I have heard > that it is something that apache wants, but considering that we are very > comfortable on how Hive interacts with the Hadoop ecosystem as a sub project > for Hadoop, there has to be some big incentive for the project to be a TLP > and nowhere have a seen how this would benefit Hive. Any thoughts on that? > > Ashish > > ________________________________ > From: Jeff Hammerbacher [mailto:ham...@cloudera.com] > Sent: Wednesday, April 21, 2010 7:35 PM > To: hive-dev@hadoop.apache.org > Cc: Ashish Thusoo > Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question > > Hive already does the work to run on multiple versions of Hadoop, and the > release cycle is independent of Hadoop's. I don't see why it should remain a > subproject. I'm +1 on Hive becoming a TLP. > > On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao <zsh...@gmail.com<mailto: > zsh...@gmail.com>> wrote: > As a Hive committer, I don't feel the benefit we get from becoming a > TLP is big enough (compared with the cost) to make Hive a TLP. > From Chris's comment I see that the cost is not that big, but I still > wonder what benefit we will get from that. > > Also I didn't get the idea of the joke ("In fact, one could argue that > Pig opting not to be TLP yet is why Hive should go TLP"). I don't see > any reasons that applies to Pig but not Hive. > We should continue the discussion here, but anything in the Pig's > discussion should also be considered here. > > Zheng > > On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah <a...@cloudera.com<mailto: > a...@cloudera.com>> wrote: > > I am personally +1 on Hive being a TLP, I think it did reach the > community > > adoption and maturity level required for that. In fact, one could argue > that > > Pig opting not to be TLP yet is why Hive should go TLP :) (jk). > > > > The real question to ask is whether there is a volunteer to take care of > the > > "administrative" tasks, which isn't a ton of work afaiu (I am willing to > > volunteer if no body else up to the task, but I am not a committer and > only > > contributed a minor patch for bash/cygwin). > > > > BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP > > tradeoffs. I happen to agree with all he says, and frankly I couldn't > have > > wrote it better my self. I highlight certain parts from his message, but > I > > recommend you read the whole thing. > > > > ---------- Forwarded message ---------- > > From: Chris Douglas <cdoug...@apache.org<mailto:cdoug...@apache.org>> > > Date: Tue, Apr 13, 2010 at 11:46 PM > > Subject: Subprojects and TLP status > > To: gene...@hadoop.apache.org<mailto:gene...@hadoop.apache.org>, > priv...@hadoop.apache.org<mailto:priv...@hadoop.apache.org> > > > > Most of Hadoop's subprojects have discussed becoming top-level Apache > > projects (TLPs) in the last few weeks. Most have expressed a desire to > > remain in Hadoop. The salient parts of the discussions I've read tend > > to focus on three aspects: a technical dependence on Hadoop, > > additional overhead as a TLP, and visibility both within the Hadoop > > ecosystem and in the open source community generally. > > > > Life as a TLP: this is not much harder than being a Hadoop subproject, > > and the Apache preferences being tossed around- particularly > > "insufficiently diverse"- are not blockers. Every subproject needs to > > write a section of the report Hadoop sends to the board; almost the > > same report, sent to a new address. The initial cost is similarly > > light: copy bylaws, send a few notes to INFRA, and follow some > > directions. I think the estimated costs are far higher than they will > > be in practice. Inertia is a powerful force, but it should be > > overcome. The directions are here, and should not intimidating: > > > > http://apache.org/dev/project-creation.html > > > > Visibility: the Hadoop site does not need to change. For each > > subproject, we can literally change the hyperlinks to point to the new > > page and be done. Long-term, linking to all ASF projects that run on > > Hadoop from a prominent page is something we all want. So particularly > > in the medium-term that most are considering: visibility through the > > website will not change. Each subproject will still be linked from the > > front page. > > > > Hadoop would not be nearly as popular as it is without Zookeeper, > > HBase, Hive, and Pig. All statistics on work in shared MapReduce > > clusters show that users vastly prefer running Pig and Hive queries to > > writing MapReduce jobs. HBase continues to push features in HDFS that > > increase its adoption and relevance outside MapReduce, while sharing > > some of its NoSQL limelight. Zookeeper is not only a linchpin in real > > workloads, but many proposals for future features require it. The > > bottom line is that MapReduce and HDFS need these projects for > > visibility and adoption in precisely the same way. I don't think > > separate TLPs will uncouple the broader community from one another. > > > > Technical dependence: this has two dimensions. First, influencing > > MapReduce and HDFS. This is nonsense. Earning influence by > > contributing to a subproject is the only way to push code changes; > > nobody from any of these projects has violated that by unilaterally > > committing to HDFS or MapReduce, anyway. And anyone cynical enough to > > believe that MapReduce and HDFS would deliberately screw over or > > ignore dependent projects because they don't have PMC members is > > plainly unsuited to community-driven development. I understand that > > these projects need to protect their users, but lobbying rights are > > not an actual benefit. > > > > Second, being a coherent part of the Hadoop ecosystem. It is (mostly) > > true that Hadoop currently offers a set of mutually compatible > > frameworks. It is not true that moving them to separate Apache > > projects would make solutions less coherent or affect existing or > > future users at all. The cohesion between projects' governance is > > sufficiently weak to justify independent units, but the real > > dependencies between the projects are strong enough to keep us engaged > > with one another. And it's not as if other projects- Cascading, for > > example- aren't also organisms adapted and specialized for life in > > Hadoop. > > > > Arguments on technical dependence are ignoring the nature of the > > existing interactions. Besides, weak technical dependencies are not a > > necessary prerequisite for a subproject's independence. > > > > As for what was *not* said in these discussions, there is no argument > > that every one of these subprojects has a distinct, autonomous > > community. There was also no argument that the Hadoop PMC offers any > > valuable oversight, given that the representatives of its fiefdoms are > > too consumed by provincial matters to participate in neighboring > > governance. Most releases I've voted on: I run the unit tests, check > > the signature, verify the checksum, and know literally nothing else > > about its content. I have often never heard the names of many proposed > > committers and even some proposed PMC members. Right now, subprojects > > with enough PMC members essentially vote out their own releases and > > vote in their own committers: TLPs in all but name. > > > > The Hadoop club- in conferences, meetups, technical debates, etc.- is > > broad, diverse, and intertwined, but communities of developers have > > already clustered around subprojects. Allowing that each cluster > > should govern itself is a dry, practical matter, not an existential > > crisis. -C > > > > On 4/19/2010 11:57 AM, Ashish Thusoo wrote: > >> > >> Hi Folks, > >> > >> Recently Apache Board asked the Hadoop PMC if some sub projects can > become > >> top level projects. In the opinion of the board, big umbrella projects > make > >> it difficult to monitor the health of the communities within the sub > >> projects. If Hive does become a TLP, then we would have to elect our own > PMC > >> and take on all the administrative tasks that the Hadoop PMC does for > us. So > >> there is definitely more administrative work involved as a TLP. So the > >> question is whether we should take on this additional task keeping at > this > >> time and what tangible advantages and disadvantages would such a move > entail > >> for the project. Would like to hear what the community thinks on this > issue. > >> > >> Thanks, > >> Ashish > >> > >> PS: As some reference to what is happening in the other subprojects, at > >> this time PIG and Zookeeper have decided NOT to become TLPs where as > Hbase > >> and Avro have decided to become TLPs. > >> > >> > > > > > > -- > Yours, > Zheng > http://www.linkedin.com/in/zshao > > -- Connect to me at http://www.facebook.com/dhruba