Re: [DISCUSSION] development process of Hadoop

2011-05-10 Thread Scott Carey


On 5/6/11 7:16 AM, Marcos Ortiz mlor...@uci.cu wrote:



+1 for Git

We migrated from SVN to Git for our completed infrastructure, for many
reason:
- Git use much less space than SVN, all the changes are in a single .git

FWIW, svn 1.7 will have a single DB file too.  Though that project has
some chaos at the moment too and the release of 1.7 may be soon or a ways
away.  It is still slow over the network compared to git.  It is also
adding 'svn patch'.

- Git is awesome for branching
- Another great advantage is that there are many developers that know
Git, and how the development process can be greatly improved.

There are also many developers who know svn, and many who don't know git.
That is not a clear win.


PostgreSQL, one of my favorites open source projects that I use on my
daily work, migrated the development process to Git from CVS.

Almost anything is better than CVS.


I don't feel that the primary cause of hadoop's situation is due to svn.
Git would help with merging patches that have become stale for sure, and
especially help on the client side for developers who need maintain many
concurrent contexts.  But there are many significant process issues at the
heart of the problem that are not due to the tools.


Regards.

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz




Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-10 Thread Scott Carey


On 5/8/11 11:10 AM, Eric Baldeschwieler eri...@yahoo-inc.com wrote:

I'd agree with this too. [same disclaimer as milind, not on PMC]

In general one would not expect to see an incompatible change added in a
dot release (0.24.1 0.24.2).  I'd expect anything like that to require
community discussion and support.

As milind summarized, we seem to have support for the addition of
security to 20.  The existing mechanism of the required release vote will
confirm or deny that.

I think it is important that compatible enhancements to hadoop are
allowed into dot releases.  This is something that we've discussed but
never finalized in the community.  It is the desire to put improvements
into users hands more quickly that the next major release that drives
orgs to produce private releases of hadoop.  In general, I think it is
fair that such changes go into trunk first.  Exceptions to that also need
discussion and support IMO.

As an observer, this is a very important observation.  Sure, the default
is that dot releases are bugfix-onl.  But exceptions to these rules are
sometimes required and often beneficial to the health of the project.
Performance enhancements, minor features, and other items are sometimes
very low risk and the barrier to getting them to users earlier should be
lower.  
These issues are the sort of things that get into non-Apache releases
quickly and drive the community away from the Apache release.  Its been
well proven through those vehicles that back-porting minor features and
improvements from trunk to an old release can be done safely.


I think the key to making progress is discussion and the idea that
majority support, not consensus is what is needed to make exceptions to
our process.  Process is useful, it reduces friction.  Process without
exception is stifling.

Absolutely -- for a subset of process exceptions, a lazy majority would be
much more useful than consensus.  Others are much more dangerous
(backwards compatibility breakage)


On May 7, 2011, at 10:52 PM, Milind Bhandarkar wrote:

 [Mentioning again: I am not on the PMC, and this email contains
 non-binding opinions based on my reading the general@hadoop.apache.org
 emails.]
 
 It is my understanding that, from the beginning, the 0.20+security was
 always treated as an exception to the normal (I.e. Pre-0.20) release
 process. (This has been confirmed by the mailing list threads, in which
 many of those who are objecting to this release now - stating that it
has
 violated norms - have consented, actually argued for, breaking the
norms.)
 
 For whatever I have read on this mailing list before the vote for this
 release, it looked like most of the community agreed that what Yahoo!
Had
 produced on their own branch, outside of Apache trunk, was important
 contribution, and a release based on that would be a good idea, and
that a
 one-time release should proceed. (After all, whichever organization the
 contributors belong to, many seem to indicate that they feel ashamed not
 having an Apache release in more than a year.)
 
 From many emails on this thread, it has been clear to me, that it is a
one
 time concession given for parting ways from the normal process, and I
hope
 everyone understands that this is supposed to make Apache Hadoop
releases
 relevant once again.
 
 So, to cut it short, the 0.20.203 backward incompatibilities etc have no
 bearing on the normal process, in which no backward incompatibilities
 should be allowed in minor releases. To answer your specific question, I
 have no reason to believe that 0.22.1 could be backward incompatible
with
 0.22.0. 
 
 - milind
 
 -- 
 Milind Bhandarkar
 mbhandar...@linkedin.com
 +1-650-776-3167
 
 
 
 
 
 
 On 5/7/11 4:50 PM, Eric Sammer esam...@cloudera.com wrote:
 
 Milind:
 
 Thanks for the pointer. I remember this thread. I guess my question
 was unrelated to the specific release and more about the general mode
 of development under normal release circumstances (ie. do we permit
 backward incompatible changes between 0.22.0 and 0.22.1 or is this
 something we've allowed just for the 203 release?).
 
 I think it's important to be clear about what the MO is so end users
 can plan upgrades appropriately.
 
 Thanks!
 Sammer
 
 On May 6, 2011, at 11:52 PM, Milind Bhandarkar
mbhandar...@linkedin.com
 wrote:
 
 [I am not on PMC, but seeing that PMC may be busy with other issues, I
 will try to answer your questions.]
 
 Eric,
 
 I think the thread
 
 
http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3
C1
 8C
 5c999-4680-4684-bc55-a430c40fd...@yahoo-inc.com%3E will answer your
 questions. Here is the timeline as I see it:
 
 1. Arun proposes to create a release from the security patchset. Says
 Doug
 has proposed this earlier
 
 
(http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3
C4
 BD
 1dfea.5020...@apache.org%3E April 23, 2010) (This has been proposed
 earlier by Doug and did not get far due to concerns about the effect
 this
 would have 

Re: Release compatibility was Re: [VOTE] Release candidate 0.20.203.0-rc1

2011-05-10 Thread Todd Lipcon
On Tue, May 10, 2011 at 12:41 PM, Scott Carey sc...@richrelevance.comwrote:


 As an observer, this is a very important observation.  Sure, the default
 is that dot releases are bugfix-onl.  But exceptions to these rules are
 sometimes required and often beneficial to the health of the project.
 Performance enhancements, minor features, and other items are sometimes
 very low risk and the barrier to getting them to users earlier should be
 lower.


I agree whole-heartedly.


 These issues are the sort of things that get into non-Apache releases
 quickly and drive the community away from the Apache release.  Its been
 well proven through those vehicles that back-porting minor features and
 improvements from trunk to an old release can be done safely.


However, one shouldn't understate the difficulty of agreeing on the
risk-reward tradeoff here. While risk is mostly technical, reward may vary
widely based on the userbase or organization.

For example, everyone would agree that security was a very risky feature to
add to 20, with known backward compatibilities and a lot of fallout. For
some people (both CDH and YDH), the security features were an absolute
necessity on a tight timeline, so the risk-reward decision was clear -- I've
heard from many users, though, that they saw none of the reward from
security and wished they hadn't had to endure the resulting changes and bugs
within the 0.20 series.

Another example is the 0.20-append patch series, which is indispensable for
the HBase community but seen as overly risky by those who do not use HBase.

So, while I'm in favor of sustaining release series like 0.20-security in
theory, I also think we need a clear inclusion criteria for such branches.
As I said in a previous email, the criteria used to be low risk compatible
bug fixes only with a vote process for any exceptions. 0.20-security is
obviously entirely different, but as yet remains undefined (it's way more
than just security).

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


OT: anyone else going to berlin buzzwords?

2011-05-10 Thread Ian Holsman
If so.. I'll be there.. let's catch up.

http://www.berlinbuzzwords.de/

--
Ian Holsman
i...@holsman.net
PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman

We are not afraid of the truth, in fact we plan on taking the truth out for a 
nice meal while we persuade it to adopt our views



Re: Apache Hadoop Hackathons: 5/11 and 5/18 in SF and Palo Alto

2011-05-10 Thread Arun C Murthy

Awesome.

I'm sorry to miss this (I'm neck deep in MR-279), but a heads up on  
forward porting from my end:
# Luke and Suresh are sick of me bugging them and just rolled up their  
sleeves to finish up all of the metrics2 work! Thanks guys! I've  
reviewed/committed Luke's https://issues.apache.org/jira/browse/HADOOP-6919 
 and Suresh finished up the rest.
# I'll work with Devaraj/Chris to finish up the TT security hole (https://issues.apache.org/jira/browse/MAPREDUCE-2178 
).


thanks,
Arun

On May 9, 2011, at 2:36 PM, Jeff Hammerbacher wrote:


Hey,

We've got 23 folks signed up for the Hackathon this Wednesday from
organizations like Cloudera, Facebook, Twitter, AOL, Trend Micro,
StumbleUpon, and Ngmoco. We'll also have the VP of the HBase PMC,  
Michael
Stack, and the VP of the Hive PMC, John Sichi. If you've been  
waiting to get

involved in Apache Hadoop, development, now is the time!

We've got room for 10 - 15 more. If you're in San Francisco or Palo  
Alto and

want to help out with Apache Hadoop development, sign up at
http://hadoophackathon.eventbrite.com.

Regards,
Jeff

On Fri, May 6, 2011 at 12:03 PM, Jeff Hammerbacher ham...@cloudera.com 
wrote:



Hey,

The discussion this week about the 0.20.203.0 release has done a  
great job
of highlighting some issues in our development process; it's also  
done a

great job of lifting our mailing list activity metrics. After reading
through the various threads, it's clear that everyone agrees on two  
things:


1) We'd like to get the work done in the 0.20.x branches into trunk
2) We'd like to start releasing off of trunk again

To that end, a few folks from Yahoo!, Cloudera, StumbleUpon, and  
Facebook
would like to put together a series of Hackathons to 1) burn down  
the 13
(only 13!) remaining blockers for the 0.22 release and 2) forward- 
port the

work done in the 0.20.x branches into trunk.

Cloudera will be hosting Hackathons from 10 am to 6 pm in both our  
Palo
Alto and San Francisco offices next Wednesday, 5/11, and the  
following
Wednesday, 5/18, to ensure both of these tasks get completed in  
short order.
PMC members Todd Lipcon and Tom White will lead the SF group and  
PMC members

Eli Collins and Patrick Hunt will lead the Palo Alto group.

Whether you're a long-time contributor or don't have a patch to  
your name,
now is a great time to get involved in Apache Hadoop development.  
Forward
porting patches is a lot easier than writing them from scratch, and  
you'll

have mentors present to help guide you through the patch testing and
submission process.

To sign up for the 5/11 Hackathon in either SF or PA, head over to
http://hadoophackathon.eventbrite.com.

As a reminder, the SF HUG will be held on 5/11 at 6 pm in the  
Cloudera SF
offices: http://www.meetup.com/hadoopsf/events/17354462/. If you  
can't

make it on 5/11, we'll send out a link next week for the 5/18 event.

Looking forward to getting the release train moving again!

Regards,
Jeff

p.s. if you'd like to participate remotely, email me directly and  
I'll see

about how we can teleconference you into the event.







newbie label on JIRA

2011-05-10 Thread Todd Lipcon
Hi all,

I spent this afternoon looking through JIRA to identify some issues that I
think would be good for new contributors to try their hand at. In my mind,
the qualities of such an issue are:

- fairly straightforward issue to solve (an experienced contributor would be
able to address it in 30-60 minutes)
- fairly tight scope (doesn't require understanding of a lot of different
moving pieces)
- easy to write a unit test for (so we get new contributors on the right
path of testing their changes)
- not likely to be controversial among contributors

I came up with about 25 of these from looking through the 0.22 and 0.23
Affects Version lists:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22

I'd like to encourage others to look through any JIRAs that they think fit
the bill, and add the same label. Then, we can point new contributors at
this list of JIRAs -- hopefully this will get them on the right path towards
understanding our project's workflow and give some nice positive
reinforcement since they should be easy to review and commit quickly.

Thanks!

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Apache Hadoop Hackathons: 5/11 and 5/18 in SF and Palo Alto

2011-05-10 Thread Tom White
On Tue, May 10, 2011 at 5:57 PM, Arun C Murthy a...@yahoo-inc.com wrote:
 Awesome.

 I'm sorry to miss this (I'm neck deep in MR-279), but a heads up on forward
 porting from my end:
 # Luke and Suresh are sick of me bugging them and just rolled up their
 sleeves to finish up all of the metrics2 work! Thanks guys! I've
 reviewed/committed Luke's https://issues.apache.org/jira/browse/HADOOP-6919
 and Suresh finished up the rest.
 # I'll work with Devaraj/Chris to finish up the TT security hole
 (https://issues.apache.org/jira/browse/MAPREDUCE-2178).

Thanks Arun - that's great!

Tom


 thanks,
 Arun

 On May 9, 2011, at 2:36 PM, Jeff Hammerbacher wrote:

 Hey,

 We've got 23 folks signed up for the Hackathon this Wednesday from
 organizations like Cloudera, Facebook, Twitter, AOL, Trend Micro,
 StumbleUpon, and Ngmoco. We'll also have the VP of the HBase PMC, Michael
 Stack, and the VP of the Hive PMC, John Sichi. If you've been waiting to
 get
 involved in Apache Hadoop, development, now is the time!

 We've got room for 10 - 15 more. If you're in San Francisco or Palo Alto
 and
 want to help out with Apache Hadoop development, sign up at
 http://hadoophackathon.eventbrite.com.

 Regards,
 Jeff

 On Fri, May 6, 2011 at 12:03 PM, Jeff Hammerbacher
 ham...@cloudera.comwrote:

 Hey,

 The discussion this week about the 0.20.203.0 release has done a great
 job
 of highlighting some issues in our development process; it's also done a
 great job of lifting our mailing list activity metrics. After reading
 through the various threads, it's clear that everyone agrees on two
 things:

 1) We'd like to get the work done in the 0.20.x branches into trunk
 2) We'd like to start releasing off of trunk again

 To that end, a few folks from Yahoo!, Cloudera, StumbleUpon, and Facebook
 would like to put together a series of Hackathons to 1) burn down the 13
 (only 13!) remaining blockers for the 0.22 release and 2) forward-port
 the
 work done in the 0.20.x branches into trunk.

 Cloudera will be hosting Hackathons from 10 am to 6 pm in both our Palo
 Alto and San Francisco offices next Wednesday, 5/11, and the following
 Wednesday, 5/18, to ensure both of these tasks get completed in short
 order.
 PMC members Todd Lipcon and Tom White will lead the SF group and PMC
 members
 Eli Collins and Patrick Hunt will lead the Palo Alto group.

 Whether you're a long-time contributor or don't have a patch to your
 name,
 now is a great time to get involved in Apache Hadoop development. Forward
 porting patches is a lot easier than writing them from scratch, and
 you'll
 have mentors present to help guide you through the patch testing and
 submission process.

 To sign up for the 5/11 Hackathon in either SF or PA, head over to
 http://hadoophackathon.eventbrite.com.

 As a reminder, the SF HUG will be held on 5/11 at 6 pm in the Cloudera SF
 offices: http://www.meetup.com/hadoopsf/events/17354462/. If you can't
 make it on 5/11, we'll send out a link next week for the 5/18 event.

 Looking forward to getting the release train moving again!

 Regards,
 Jeff

 p.s. if you'd like to participate remotely, email me directly and I'll
 see
 about how we can teleconference you into the event.







Re: newbie label on JIRA

2011-05-10 Thread Konstantin Boudnik
Todd - this is a great idea and nice list of JIRAs to take care about!
I assume you are leaving 0.22 blockers to more experienced contributors, right?
--
  Take care,
Konstantin (Cos) Boudnik

On Tue, May 10, 2011 at 19:49, Todd Lipcon t...@cloudera.com wrote:
 Hi all,

 I spent this afternoon looking through JIRA to identify some issues that I
 think would be good for new contributors to try their hand at. In my mind,
 the qualities of such an issue are:

 - fairly straightforward issue to solve (an experienced contributor would be
 able to address it in 30-60 minutes)
 - fairly tight scope (doesn't require understanding of a lot of different
 moving pieces)
 - easy to write a unit test for (so we get new contributors on the right
 path of testing their changes)
 - not likely to be controversial among contributors

 I came up with about 25 of these from looking through the 0.22 and 0.23
 Affects Version lists:
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+in+(%22HADOOP%22,+%22MAPREDUCE%22,+%22HDFS%22)+and+labels+%3D+%22newbie%22

 I'd like to encourage others to look through any JIRAs that they think fit
 the bill, and add the same label. Then, we can point new contributors at
 this list of JIRAs -- hopefully this will get them on the right path towards
 understanding our project's workflow and give some nice positive
 reinforcement since they should be easy to review and commit quickly.

 Thanks!

 -Todd
 --
 Todd Lipcon
 Software Engineer, Cloudera