Re: [DISCUSSION] development process of Hadoop
On 5/6/11 7:16 AM, Marcos Ortiz mlor...@uci.cu wrote: +1 for Git We migrated from SVN to Git for our completed infrastructure, for many reason: - Git use much less space than SVN, all the changes are in a single .git FWIW, svn 1.7 will have a single DB file too. Though that project has some chaos at the moment too and the release of 1.7 may be soon or a ways away. It is still slow over the network compared to git. It is also adding 'svn patch'. - Git is awesome for branching - Another great advantage is that there are many developers that know Git, and how the development process can be greatly improved. There are also many developers who know svn, and many who don't know git. That is not a clear win. PostgreSQL, one of my favorites open source projects that I use on my daily work, migrated the development process to Git from CVS. Almost anything is better than CVS. I don't feel that the primary cause of hadoop's situation is due to svn. Git would help with merging patches that have become stale for sure, and especially help on the client side for developers who need maintain many concurrent contexts. But there are many significant process issues at the heart of the problem that are not due to the tools. Regards. -- Marcos Luís Ortíz Valmaseda Software Engineer (Large-Scaled Distributed Systems) University of Information Sciences, La Habana, Cuba Linux User # 418229 http://about.me/marcosortiz
Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1
On 5/8/11 11:10 AM, Eric Baldeschwieler eri...@yahoo-inc.com wrote: I'd agree with this too. [same disclaimer as milind, not on PMC] In general one would not expect to see an incompatible change added in a dot release (0.24.1 0.24.2). I'd expect anything like that to require community discussion and support. As milind summarized, we seem to have support for the addition of security to 20. The existing mechanism of the required release vote will confirm or deny that. I think it is important that compatible enhancements to hadoop are allowed into dot releases. This is something that we've discussed but never finalized in the community. It is the desire to put improvements into users hands more quickly that the next major release that drives orgs to produce private releases of hadoop. In general, I think it is fair that such changes go into trunk first. Exceptions to that also need discussion and support IMO. As an observer, this is a very important observation. Sure, the default is that dot releases are bugfix-onl. But exceptions to these rules are sometimes required and often beneficial to the health of the project. Performance enhancements, minor features, and other items are sometimes very low risk and the barrier to getting them to users earlier should be lower. These issues are the sort of things that get into non-Apache releases quickly and drive the community away from the Apache release. Its been well proven through those vehicles that back-porting minor features and improvements from trunk to an old release can be done safely. I think the key to making progress is discussion and the idea that majority support, not consensus is what is needed to make exceptions to our process. Process is useful, it reduces friction. Process without exception is stifling. Absolutely -- for a subset of process exceptions, a lazy majority would be much more useful than consensus. Others are much more dangerous (backwards compatibility breakage) On May 7, 2011, at 10:52 PM, Milind Bhandarkar wrote: [Mentioning again: I am not on the PMC, and this email contains non-binding opinions based on my reading the general@hadoop.apache.org emails.] It is my understanding that, from the beginning, the 0.20+security was always treated as an exception to the normal (I.e. Pre-0.20) release process. (This has been confirmed by the mailing list threads, in which many of those who are objecting to this release now - stating that it has violated norms - have consented, actually argued for, breaking the norms.) For whatever I have read on this mailing list before the vote for this release, it looked like most of the community agreed that what Yahoo! Had produced on their own branch, outside of Apache trunk, was important contribution, and a release based on that would be a good idea, and that a one-time release should proceed. (After all, whichever organization the contributors belong to, many seem to indicate that they feel ashamed not having an Apache release in more than a year.) From many emails on this thread, it has been clear to me, that it is a one time concession given for parting ways from the normal process, and I hope everyone understands that this is supposed to make Apache Hadoop releases relevant once again. So, to cut it short, the 0.20.203 backward incompatibilities etc have no bearing on the normal process, in which no backward incompatibilities should be allowed in minor releases. To answer your specific question, I have no reason to believe that 0.22.1 could be backward incompatible with 0.22.0. - milind -- Milind Bhandarkar mbhandar...@linkedin.com +1-650-776-3167 On 5/7/11 4:50 PM, Eric Sammer esam...@cloudera.com wrote: Milind: Thanks for the pointer. I remember this thread. I guess my question was unrelated to the specific release and more about the general mode of development under normal release circumstances (ie. do we permit backward incompatible changes between 0.22.0 and 0.22.1 or is this something we've allowed just for the 203 release?). I think it's important to be clear about what the MO is so end users can plan upgrades appropriately. Thanks! Sammer On May 6, 2011, at 11:52 PM, Milind Bhandarkar mbhandar...@linkedin.com wrote: [I am not on PMC, but seeing that PMC may be busy with other issues, I will try to answer your questions.] Eric, I think the thread http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3 C1 8C 5c999-4680-4684-bc55-a430c40fd...@yahoo-inc.com%3E will answer your questions. Here is the timeline as I see it: 1. Arun proposes to create a release from the security patchset. Says Doug has proposed this earlier (http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3 C4 BD 1dfea.5020...@apache.org%3E April 23, 2010) (This has been proposed earlier by Doug and did not get far due to concerns about the effect this would have
Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1
On Tue, May 10, 2011 at 12:41 PM, Scott Carey sc...@richrelevance.comwrote: As an observer, this is a very important observation. Sure, the default is that dot releases are bugfix-onl. But exceptions to these rules are sometimes required and often beneficial to the health of the project. Performance enhancements, minor features, and other items are sometimes very low risk and the barrier to getting them to users earlier should be lower. I agree whole-heartedly. These issues are the sort of things that get into non-Apache releases quickly and drive the community away from the Apache release. Its been well proven through those vehicles that back-porting minor features and improvements from trunk to an old release can be done safely. However, one shouldn't understate the difficulty of agreeing on the risk-reward tradeoff here. While risk is mostly technical, reward may vary widely based on the userbase or organization. For example, everyone would agree that security was a very risky feature to add to 20, with known backward compatibilities and a lot of fallout. For some people (both CDH and YDH), the security features were an absolute necessity on a tight timeline, so the risk-reward decision was clear -- I've heard from many users, though, that they saw none of the reward from security and wished they hadn't had to endure the resulting changes and bugs within the 0.20 series. Another example is the 0.20-append patch series, which is indispensable for the HBase community but seen as overly risky by those who do not use HBase. So, while I'm in favor of sustaining release series like 0.20-security in theory, I also think we need a clear inclusion criteria for such branches. As I said in a previous email, the criteria used to be low risk compatible bug fixes only with a vote process for any exceptions. 0.20-security is obviously entirely different, but as yet remains undefined (it's way more than just security). -Todd -- Todd Lipcon Software Engineer, Cloudera
OT: anyone else going to berlin buzzwords?
If so.. I'll be there.. let's catch up. http://www.berlinbuzzwords.de/ -- Ian Holsman i...@holsman.net PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman We are not afraid of the truth, in fact we plan on taking the truth out for a nice meal while we persuade it to adopt our views
Re: Apache Hadoop Hackathons: 5/11 and 5/18 in SF and Palo Alto
Awesome. I'm sorry to miss this (I'm neck deep in MR-279), but a heads up on forward porting from my end: # Luke and Suresh are sick of me bugging them and just rolled up their sleeves to finish up all of the metrics2 work! Thanks guys! I've reviewed/committed Luke's https://issues.apache.org/jira/browse/HADOOP-6919 and Suresh finished up the rest. # I'll work with Devaraj/Chris to finish up the TT security hole (https://issues.apache.org/jira/browse/MAPREDUCE-2178 ). thanks, Arun On May 9, 2011, at 2:36 PM, Jeff Hammerbacher wrote: Hey, We've got 23 folks signed up for the Hackathon this Wednesday from organizations like Cloudera, Facebook, Twitter, AOL, Trend Micro, StumbleUpon, and Ngmoco. We'll also have the VP of the HBase PMC, Michael Stack, and the VP of the Hive PMC, John Sichi. If you've been waiting to get involved in Apache Hadoop, development, now is the time! We've got room for 10 - 15 more. If you're in San Francisco or Palo Alto and want to help out with Apache Hadoop development, sign up at http://hadoophackathon.eventbrite.com. Regards, Jeff On Fri, May 6, 2011 at 12:03 PM, Jeff Hammerbacher ham...@cloudera.com wrote: Hey, The discussion this week about the 0.20.203.0 release has done a great job of highlighting some issues in our development process; it's also done a great job of lifting our mailing list activity metrics. After reading through the various threads, it's clear that everyone agrees on two things: 1) We'd like to get the work done in the 0.20.x branches into trunk 2) We'd like to start releasing off of trunk again To that end, a few folks from Yahoo!, Cloudera, StumbleUpon, and Facebook would like to put together a series of Hackathons to 1) burn down the 13 (only 13!) remaining blockers for the 0.22 release and 2) forward- port the work done in the 0.20.x branches into trunk. Cloudera will be hosting Hackathons from 10 am to 6 pm in both our Palo Alto and San Francisco offices next Wednesday, 5/11, and the following Wednesday, 5/18, to ensure both of these tasks get completed in short order. PMC members Todd Lipcon and Tom White will lead the SF group and PMC members Eli Collins and Patrick Hunt will lead the Palo Alto group. Whether you're a long-time contributor or don't have a patch to your name, now is a great time to get involved in Apache Hadoop development. Forward porting patches is a lot easier than writing them from scratch, and you'll have mentors present to help guide you through the patch testing and submission process. To sign up for the 5/11 Hackathon in either SF or PA, head over to http://hadoophackathon.eventbrite.com. As a reminder, the SF HUG will be held on 5/11 at 6 pm in the Cloudera SF offices: http://www.meetup.com/hadoopsf/events/17354462/. If you can't make it on 5/11, we'll send out a link next week for the 5/18 event. Looking forward to getting the release train moving again! Regards, Jeff p.s. if you'd like to participate remotely, email me directly and I'll see about how we can teleconference you into the event.
newbie label on JIRA
Hi all, I spent this afternoon looking through JIRA to identify some issues that I think would be good for new contributors to try their hand at. In my mind, the qualities of such an issue are: - fairly straightforward issue to solve (an experienced contributor would be able to address it in 30-60 minutes) - fairly tight scope (doesn't require understanding of a lot of different moving pieces) - easy to write a unit test for (so we get new contributors on the right path of testing their changes) - not likely to be controversial among contributors I came up with about 25 of these from looking through the 0.22 and 0.23 Affects Version lists: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22 I'd like to encourage others to look through any JIRAs that they think fit the bill, and add the same label. Then, we can point new contributors at this list of JIRAs -- hopefully this will get them on the right path towards understanding our project's workflow and give some nice positive reinforcement since they should be easy to review and commit quickly. Thanks! -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: Apache Hadoop Hackathons: 5/11 and 5/18 in SF and Palo Alto
On Tue, May 10, 2011 at 5:57 PM, Arun C Murthy a...@yahoo-inc.com wrote: Awesome. I'm sorry to miss this (I'm neck deep in MR-279), but a heads up on forward porting from my end: # Luke and Suresh are sick of me bugging them and just rolled up their sleeves to finish up all of the metrics2 work! Thanks guys! I've reviewed/committed Luke's https://issues.apache.org/jira/browse/HADOOP-6919 and Suresh finished up the rest. # I'll work with Devaraj/Chris to finish up the TT security hole (https://issues.apache.org/jira/browse/MAPREDUCE-2178). Thanks Arun - that's great! Tom thanks, Arun On May 9, 2011, at 2:36 PM, Jeff Hammerbacher wrote: Hey, We've got 23 folks signed up for the Hackathon this Wednesday from organizations like Cloudera, Facebook, Twitter, AOL, Trend Micro, StumbleUpon, and Ngmoco. We'll also have the VP of the HBase PMC, Michael Stack, and the VP of the Hive PMC, John Sichi. If you've been waiting to get involved in Apache Hadoop, development, now is the time! We've got room for 10 - 15 more. If you're in San Francisco or Palo Alto and want to help out with Apache Hadoop development, sign up at http://hadoophackathon.eventbrite.com. Regards, Jeff On Fri, May 6, 2011 at 12:03 PM, Jeff Hammerbacher ham...@cloudera.comwrote: Hey, The discussion this week about the 0.20.203.0 release has done a great job of highlighting some issues in our development process; it's also done a great job of lifting our mailing list activity metrics. After reading through the various threads, it's clear that everyone agrees on two things: 1) We'd like to get the work done in the 0.20.x branches into trunk 2) We'd like to start releasing off of trunk again To that end, a few folks from Yahoo!, Cloudera, StumbleUpon, and Facebook would like to put together a series of Hackathons to 1) burn down the 13 (only 13!) remaining blockers for the 0.22 release and 2) forward-port the work done in the 0.20.x branches into trunk. Cloudera will be hosting Hackathons from 10 am to 6 pm in both our Palo Alto and San Francisco offices next Wednesday, 5/11, and the following Wednesday, 5/18, to ensure both of these tasks get completed in short order. PMC members Todd Lipcon and Tom White will lead the SF group and PMC members Eli Collins and Patrick Hunt will lead the Palo Alto group. Whether you're a long-time contributor or don't have a patch to your name, now is a great time to get involved in Apache Hadoop development. Forward porting patches is a lot easier than writing them from scratch, and you'll have mentors present to help guide you through the patch testing and submission process. To sign up for the 5/11 Hackathon in either SF or PA, head over to http://hadoophackathon.eventbrite.com. As a reminder, the SF HUG will be held on 5/11 at 6 pm in the Cloudera SF offices: http://www.meetup.com/hadoopsf/events/17354462/. If you can't make it on 5/11, we'll send out a link next week for the 5/18 event. Looking forward to getting the release train moving again! Regards, Jeff p.s. if you'd like to participate remotely, email me directly and I'll see about how we can teleconference you into the event.
Re: newbie label on JIRA
Todd - this is a great idea and nice list of JIRAs to take care about! I assume you are leaving 0.22 blockers to more experienced contributors, right? -- Take care, Konstantin (Cos) Boudnik On Tue, May 10, 2011 at 19:49, Todd Lipcon t...@cloudera.com wrote: Hi all, I spent this afternoon looking through JIRA to identify some issues that I think would be good for new contributors to try their hand at. In my mind, the qualities of such an issue are: - fairly straightforward issue to solve (an experienced contributor would be able to address it in 30-60 minutes) - fairly tight scope (doesn't require understanding of a lot of different moving pieces) - easy to write a unit test for (so we get new contributors on the right path of testing their changes) - not likely to be controversial among contributors I came up with about 25 of these from looking through the 0.22 and 0.23 Affects Version lists: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22 I'd like to encourage others to look through any JIRAs that they think fit the bill, and add the same label. Then, we can point new contributors at this list of JIRAs -- hopefully this will get them on the right path towards understanding our project's workflow and give some nice positive reinforcement since they should be easy to review and commit quickly. Thanks! -Todd -- Todd Lipcon Software Engineer, Cloudera