SF Hadoop meetup report

2011-01-14 Thread Aaron Kimball
Hadoop fans, This week we held the inaugural SF Hadoop meetup, and it was a great success! About forty people attended, and we held a number of great discussions. After an initial plenary session, an agenda was quickly drawn up and we identified a number of interesting topics. We spent the rest of

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Ryan Rawson
+1 the post split scripts are the worst things ever.

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Suresh Srinivas
I like the idea of merging projects together. It save a lot of time. However, I would like to see a detailed proposal on how this will be done and discussions on it, before moving forward on this. If this work is done, need clear messages to the developers on what has changed, and how developmen

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Jay Booth
On the flipside, right now it's only the developers, QAs and release engineers. There hasn't been much movement to 0.21 yet, and if we're agreed on the change in general, then pushing out 0.22 without it means making users change everything twice. On Fri, Jan 14, 2011 at 2:53 PM, Tsz Wo (Nicholas

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Milind Bhandarkar
Dhruba, While I do not think that the releasability of a branch should be determined by the market-cap (either on nasdaq or second-market) of the contributing company, I think a well-tested release is beneficial to the community. So, I support two releases: 20.100 now, that has security. And 20

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Tsz Wo (Nicholas), Sze
This is a kind of an incompatible change: all the developers, QAs, release engineers and users have to change their local settings and scripts for this change. Moreover, there are documentations, web pages and existing tools using the Apache svn URLs. So it is a huge impact. I am conservative

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Dhruba Borthakur
> > > 1) I agree this is not a good precedent. We don't support mega-patches in > general. We are doing this as part of discontinuing the "yahoo distribution > of Hadoop". We don't plan to continue doing 30 person year projects outside > apache and then merging them in!! > > I think this is a very

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Konstantin Shvachko
Well this will generate tens of more jiras, which wont justify closing the few remaining. You have been there, it took months to get that thing settled so it was usable. I am just saying its a risk we can get the same this time. --Konstantin On Fri, Jan 14, 2011 at 11:08 AM, Nigel Daley wrote: >

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Nigel Daley
On Jan 14, 2011, at 11:16 AM, Tsz Wo (Nicholas), Sze wrote: > Hi Nigel, > >> As I look more at the impact of the common/MR/HDFS project split on what >> and how we release Hadoop, I feel like the split needs an adjustment. Many >> folks I've talked to agree that the project split has caused us

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Tsz Wo (Nicholas), Sze
Hi Nigel, > As I look more at the impact of the common/MR/HDFS project split on what > and how we release Hadoop, I feel like the split needs an adjustment. Many > folks I've talked to agree that the project split has caused us a splitting > headache. I think 1 relatively small change could alle

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Nigel Daley
On Jan 14, 2011, at 11:01 AM, Konstantin Shvachko wrote: > We actually still haven't recovered from the projects split. > We are still fixing HDFS and MR scripts with several jiras open. Great, so let's do this reorg before we fix those jira's so we don't need to fix them again. Can you provid

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Konstantin Shvachko
We actually still haven't recovered from the projects split. We are still fixing HDFS and MR scripts with several jiras open. If we start this re-split now again before the major release we risk to get into the same mess, and it will create more work for the community. I see Nigel's point that pa

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Eric Baldeschwieler
Cool --- E14 - via iPhone On Jan 14, 2011, at 10:01 AM, "Nigel Daley" wrote: > Thanks for the offer Eric! I agree it's the right time to mavenize, but I > think we should separate, but order, these two discussions/events. This > first, then mavenization. > > Cheers, > Nige > > On Jan 14,

Re: triggering automated precommit testing

2011-01-14 Thread Nigel Daley
Suresh, FWIW the precommit build fails fast in such cases (by design). nige On Jan 14, 2011, at 10:12 AM, Suresh Srinivas wrote: > It may not be as simple as triggering retest, as Some of the patches could be > old and may not apply. > > > On 1/14/11 8:44 AM, "Nigel Daley" wrote: > > Todd,

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Eric Baldeschwieler
Yup. Letting people who want to contribute, do so a good meme! A stable next release would be great. But orgs do sustaining on stable code releases for a lot of very good reasons. A next Hadoop 21+ of this code quality is almost a year away in my opinion. --- E14 - via iPhone On Jan 14, 2011

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Eric Baldeschwieler
Hi Ian, Thanks for holding off on that last .5. I've been working in a big email giving move context on this. Let me preview some issues. Our goal with this branch is two fold: 1) get the code out in a branch quickly so we an collaborate on it with the community. 2) not change the character of

Re: triggering automated precommit testing

2011-01-14 Thread Suresh Srinivas
It may not be as simple as triggering retest, as Some of the patches could be old and may not apply. On 1/14/11 8:44 AM, "Nigel Daley" wrote: Todd, please file a Jira in Common against test component. FWIW, I fear the precommit integration with Jira will need some amount of work as Apache mo

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Jakob Homan
> On another thread discussing hadoop-0.20-append as a separate branch, most > people agreed that new features shouldn't be added to 0.20, now we have a > major feature and we are all gung ho for it.. Not all are. I'm against it for the all the same reasons I was against 20 append. This is als

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Nigel Daley
Thanks for the offer Eric! I agree it's the right time to mavenize, but I think we should separate, but order, these two discussions/events. This first, then mavenization. Cheers, Nige On Jan 14, 2011, at 9:32 AM, Eric Baldeschwieler wrote: > I'm a huge supporter of the idea. On a related no

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Ian Holsman
On Jan 14, 2011, at 12:32 PM, Nigel Daley wrote: > Yup, I'll say it again. The process ain't perfect but it's good enough IMO. > Thank you Yahoo! for your contribution. agree 100%.

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Nigel Daley
On Jan 14, 2011, at 8:51 AM, Owen O'Malley wrote: > > On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote: > >> Folks, >> >> As I look more at the impact of the common/MR/HDFS project split on what and >> how we release Hadoop, I feel like the split needs an adjustment. Many >> folks I've talke

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Konstantin Boudnik
On Fri, Jan 14, 2011 at 09:32, Eric Baldeschwieler wrote: > I'm a huge supporter of the idea. On a related note, we've been looking for > the right time to mavenize. Maybe we can do both together. We could pitch in > a bunch of work on both if we could get the timing right. Adding maveninzation

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Tom White
+1 to Owen's modified layout. The current layout means that every svn operation made during a release has to be carried out three times which increases the amount of work and the chances of a mistake. Cheers Tom On Fri, Jan 14, 2011 at 9:17 AM, Todd Lipcon wrote: > On Fri, Jan 14, 2011 at 8:51

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Eric Baldeschwieler
Cool! --- E14 - via iPhone On Jan 14, 2011, at 9:18 AM, "Todd Lipcon" wrote: > On Fri, Jan 14, 2011 at 8:51 AM, Owen O'Malley wrote: > >> >> On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote: >> >> Folks, >>> >>> As I look more at the impact of the common/MR/HDFS project split on what >>> an

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Eric Baldeschwieler
I'm a huge supporter of the idea. On a related note, we've been looking for the right time to mavenize. Maybe we can do both together. We could pitch in a bunch of work on both if we could get the timing right. We've got a huge batch of commits in flight now, but if we can find something that

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Nigel Daley
Yup, I'll say it again. The process ain't perfect but it's good enough IMO. Thank you Yahoo! for your contribution. Clearly these patch will need review before commit when going into trunk. Let's move on to 0.22. Nige On Jan 14, 2011, at 9:20 AM, Konstantin Boudnik wrote: > I tend to second

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Konstantin Boudnik
I tend to second most of Ian's points here. On Fri, Jan 14, 2011 at 06:14, Ian Holsman wrote: > (with my Apache hat on) > I'm -0.5 on doing this as one big mega-patch and not including append (as > opposed to a series of smaller patches). #1: we are creating a precedent of a "brain-dump" here.

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Todd Lipcon
On Fri, Jan 14, 2011 at 8:51 AM, Owen O'Malley wrote: > > On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote: > > Folks, >> >> As I look more at the impact of the common/MR/HDFS project split on what >> and how we release Hadoop, I feel like the split needs an adjustment. Many >> folks I've talked

Re: Ease of development in Hadoop

2011-01-14 Thread Owen O'Malley
Redirecting to mapreduce-dev. As described in http://hadoop.apache.org/mailing_lists.html, general isn't for user/dev questions. On Fri, Jan 14, 2011 at 3:58 AM, Grandl Robert wrote: > I want to make some minor modifications in FairShare scheduler and > recompile it such that to use it as before

Re: Restricting number of records from map output

2011-01-14 Thread Niels Basjes
Hi, > I have a sort job consisting of only the Mapper (no Reducer) task. I want my > results to contain only the top n records. Is there any way of restricting > the number of records that are emitted by the Mappers? > > Basically I am looking to see if there is an equivalent of achieving > the be

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Owen O'Malley
On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote: Folks, As I look more at the impact of the common/MR/HDFS project split on what and how we release Hadoop, I feel like the split needs an adjustment. Many folks I've talked to agree that the project split has caused us a splitting headach

Re: triggering automated precommit testing

2011-01-14 Thread Nigel Daley
Todd, please file a Jira in Common against test component. FWIW, I fear the precommit integration with Jira will need some amount of work as Apache moves to Jira 4.2 in the coming weeks. Nige On Jan 14, 2011, at 12:57 AM, Todd Lipcon wrote: > Hey Nigel, > > Would there be any way to add a fe

Re: triggering automated precommit testing

2011-01-14 Thread Nigel Daley
Ian, Todd, Cos, Konst, Jakob, Anyone else given this access: PLEASE do NOT kill any builds from Hudson (little red x box) even if they appear hung. Hudson does not kill the underlying processes properly so they get left behind and can silently affect subsequent jobs. Instead, email this list

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-14 Thread Ian Holsman
(with my Apache hat on) I'm -0.5 on doing this as one big mega-patch and not including append (as opposed to a series of smaller patches). for the following reasons: 1. It encourages bad behavior. We want discussion (and development) to happen on the lists, not in some office. By allowing these

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Ian Holsman
on that note... I propose we discuss un-splitting the project altogether. On Jan 14, 2011, at 3:39 AM, Jakob Homan wrote: > +1. The project split is a lie. > > On Fri, Jan 14, 2011 at 12:32 AM, Ian Holsman wrote: >> +1 full agreement. >> >> I think it will be a pita admin wise (due to how svn

Re: triggering automated precommit testing

2011-01-14 Thread Ian Holsman
added.. (and Todd as well). On Jan 13, 2011, at 3:21 PM, Suresh Srinivas wrote: > Please add me... > > > On 1/12/11 5:18 PM, "Ian Holsman" wrote: > > and done. > anybody else want access? > > On Jan 12, 2011, at 8:12 PM, Nigel Daley wrote: > >> I believe ommitters can gain access following

Ease of development in Hadoop

2011-01-14 Thread Grandl Robert
Hi all, I want to make some minor modifications in FairShare scheduler and recompile it such that to use it as before. Is there an easy way to do it without spending much time on recompiling everything or without errors ? I read here http://wiki.apache.org/hadoop/EclipseEnvironment but I did no

Re: triggering automated precommit testing

2011-01-14 Thread Todd Lipcon
Hey Nigel, Would there be any way to add a feature where we can make some special comment on the JIRA that would trigger a hudson retest? There are a lot of really old patches out on the JIRA that would be worth re-testing against trunk, and it's a pain to download and re-attach. I'm thinking a c

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Jakob Homan
+1. The project split is a lie. On Fri, Jan 14, 2011 at 12:32 AM, Ian Holsman wrote: > +1 full agreement. > > I think it will be a pita admin wise (due to how svn authorization is set > up), so it might slow down creation of a new branch, but its worth it. > > --- > Ian Holsman > AOL Inc > ian.h

Re: [DISCUSS] Move project split down a level

2011-01-14 Thread Ian Holsman
+1 full agreement. I think it will be a pita admin wise (due to how svn authorization is set up), so it might slow down creation of a new branch, but its worth it. --- Ian Holsman AOL Inc ian.hols...@teamaol.com (703) 879-3128 / AIM:ianholsman it's just a technicality On Jan 14, 2011, at 2: