Re: email lists: too many or not enough?

2011-01-13 Thread Eric Baldeschwieler
Much improved. Thanks!

PS please keep branch discussions here. 

---
E14 - via iPhone

On Jan 12, 2011, at 5:44 PM, "Nigel Daley"  wrote:

> Updated w/ Owen's suggestions.  Feel free to tweak further.
> 
> Cheers,
> Nige
> 
> On Jan 12, 2011, at 10:08 AM, Owen O'Malley wrote:
> 
>> 
>> On Jan 12, 2011, at 9:47 AM, Nigel Daley wrote:
>> 
>>> 
>>> On Jan 11, 2011, at 8:35 PM, Nigel Daley wrote:
 
 On Jan 11, 2011, at 9:38 AM, Owen O'Malley wrote:
> We do try to move the questions off of general. It would be great if 
> someone
> updated the website with the intended usage of each of the lists.
 
 Ok, I'll work on a page.
>>> 
>>> Done. See the revamped http://hadoop.apache.org/mailing_lists.html
>>> Corrections welcome.  You must now click on general@ to see the 
>>> subscription instructions (it's a new page).
>> 
>> I'd suggest segregating the list into 4 tables based on audience:
>> 
>> user questions:
>> common-user
>> hdfs-user
>> mapreduce-user
>> security - only for notifying the project of security vulnerabilities
>> 
>> project level announcements and discussions:
>> general
>> 
>> developer questions:
>> common-dev
>> hdfs-dev
>> mapreduce-dev
>> 
>> jira and subversion tracking for developers:
>> common-issues
>> common-commits
>> hdfs-issues
>> hdfs-commits
>> 
>> mapreduce-issues
>> mapreduce-commits
>> 
>> -- Owen
> 


Re: triggering automated precommit testing

2011-01-13 Thread Suresh Srinivas
Please add me...


On 1/12/11 5:18 PM, "Ian Holsman"  wrote:

and done.
anybody else want access?

On Jan 12, 2011, at 8:12 PM, Nigel Daley wrote:

> I believe ommitters can gain access following this:
> http://wiki.apache.org/general/Hudson#How_do_I_get_an_account
> Ian would have to run a command on people.apache.org.
>
> Nige
>
> On Jan 12, 2011, at 4:41 PM, Konstantin Shvachko wrote:
>
>> So, who has the permissions. I don't. Should I?
>> --Konstantin
>>
>> On Wed, Jan 12, 2011 at 4:33 PM, Nigel Daley  wrote:
>>
 Jakob Homan commented on HDFS-884:
 --

> Konstantin, if you're trying to kick a new patch build for this you no
>>> longer move it to "Open" and back to "Patch Available". Instead, you must
>>> upload a new patch. Or, if you have permission, you can kickhttps://
>>> hudson.apache.org/hudson/job/PreCommit-HDFS-Build/ and enter the issue
>>> number.

 That makes me sad.  Is this a new feature or regression?
>>>
>>> [For everyone's benefit, moving this to general@]
>>>
>>> Jakob, I referenced the change here: http://tinyurl.com/4crxlvy
>>> The new system is much more robust partial because it no longer relies on
>>> watching Jira generated emails to determined when issues move into Patch
>>> Available state. There is limited info I can get from the Jira API, thus the
>>> triggering mechanism had to change.
>>>
>>> Cheers,
>>> Nige
>>>
>




Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Todd Lipcon
Hi Arun, all,

When we merged YDH and CDH for CDH3b3, we went through the effort of
"linearizing" all of the YDH patches and squashing multiple commits into
single ones corresponding to a single JIRA where possible. So, we have a
100% linear set of patches that applies on top of the 0.20.2 source tree and
includes Yahoo 0.20.100.3 as well as almost all the patches from 0.20-append
and a number of other backports.

Since this could be applied as a linear set of patches instead of a big
lump, would there be interest in using this as the 0.20.>100 Apache release?
I can take the time to remove any patches that are cloudera specific or not
yet applied upstream.

Thanks
-Todd


On Wed, Jan 12, 2011 at 11:07 PM, Arun C Murthy  wrote:

>
> On Jan 12, 2011, at 2:56 PM, Nigel Daley wrote:
>
>  +1 for 0.20.x, where x >= 100.  I agree that the 1.0 moniker would involve
>> more discussion.
>>
>
> Ok, seems like we are converging; we can continue talking. I've created the
> branch to get the ball rolling.
>
>
>  Will this be a jumbo patch attached to a Jira and then committed to the
>> branch?  Just curious.
>>
>
> I'm afraid that the svn log of the branch from github Y! branch is fairly
> useless since a single JIRA might have multiple commits in the Y! branch
> (bugfix on top of a bugfix). We have done that in several cases (but the
> patches committed to trunk have a single patch which is the result of
> forward porting a complete feature/bugfix). IAC the this branch and 0.22
> have diverged so much that almost no non-trivial patch would apply without a
> significant amount of work.
>
> Thus, I think a jumbo patch should suffice. It will also ensure this can
> done quickly so that the community can then concentrate on 0.22 and beyond.
>
> However, I will (manually) ensure all relevant jiras are referenced in the
> CHANGES.txt and Release Notes for folks to see the contents of the release.
> This is the hardest part of the exercise. Also, this ensures that we can
> track these jiras for 0.22 as Eli suggested.
>
> Does that seem like a reasonable way forward? I'm happy to brainstorm.
>
> thanks,
> Arun
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy

Todd,


On Jan 13, 2011, at 2:04 PM, Todd Lipcon wrote:


Hi Arun, all,

When we merged YDH and CDH for CDH3b3, we went through the effort of
"linearizing" all of the YDH patches and squashing multiple commits  
into
single ones corresponding to a single JIRA where possible. So, we  
have a
100% linear set of patches that applies on top of the 0.20.2 source  
tree and
includes Yahoo 0.20.100.3 as well as almost all the patches from  
0.20-append

and a number of other backports.

Since this could be applied as a linear set of patches instead of a  
big
lump, would there be interest in using this as the 0.20.>100 Apache  
release?
I can take the time to remove any patches that are cloudera specific  
or not

yet applied upstream.



Interesting discussion, thanks.

I'm sure it took you a fair amount of work to squash patches (which I  
tried too, btw). That, plus the fact that we would need to do a  
similar amount of work for the 10 or so releases we have done after  
0.20.100.3 scares me.


As we Nigel and I discussed here, the jumbo  patch and an up-to-date  
CHANGES.txt provides almost all of the benefits we seek and allows all  
of us to get this done very quickly to focus on hadoop-0.22 and beyond.


What do you think?

OTOH, I could get this release done and start squashing patches for  
the sake of completeness as a background activity.


Thoughts?

thanks,
Arun


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Todd Lipcon
On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy  wrote:

> Since this could be applied as a linear set of patches instead of a big
>> lump, would there be interest in using this as the 0.20.>100 Apache
>> release?
>> I can take the time to remove any patches that are cloudera specific or
>> not
>> yet applied upstream.
>>
>>
> Interesting discussion, thanks.
>
> I'm sure it took you a fair amount of work to squash patches (which I tried
> too, btw).


Yep, I had a great summer ;-)


> That, plus the fact that we would need to do a similar amount of work for
> the 10 or so releases we have done after 0.20.100.3 scares me.
>

Sorry, I actually meant 0.20.104.3. Have there been many releases since
then? That's the last version available on the Yahoo github, and that's the
version we incorporated/linearized.

If there is a large sequence of patches after this that you're planning on
including, it would be good to see them in your git repo.



> As we Nigel and I discussed here, the jumbo  patch and an up-to-date
> CHANGES.txt provides almost all of the benefits we seek and allows all of us
> to get this done very quickly to focus on hadoop-0.22 and beyond.
>
>
In my opinion here are the downsides to this plan:

- a mondo "merge" patch is a big pain when trying to do debugging. It may be
sufficient for a user to look at CHANGES.txt, but I find myself using
blame/log/etc on individual files to understand code lineage on a daily
basis. If all of the merge shows up as a big patch it will be very difficult
(at least the way I work with code) to help users debug issues or understand
which JIRA a certain regression may have come from.

- CHANGES.txt traditionally doesn't reference which patch file from a JIRA
was checked in. So we may know that a given JIRA has been included, but
often there are several revisions of patches on the JIRA and it's difficult
to be sure that we have the most up-to-date version. By looking at change
history it's usually easy to pick this out, but if it's one giant patch
apply, this isn't possible.

- the proposal to use the YDH distro certainly solves the Security issue,
but doesn't help out HBase at all. Given HBase has been asking for a long
time to get a real release of the append branch, I think it would be better
to have one 20-based release which has both of these features, rather than
further fragmenting the community into 0.20.2, 0.20.2+security,
0.20.2+append.

I think the first two points could be addressed if you push your git tree
either to github or an apache-hosted git, and then include in SVN as a mondo
patch. It's not ideal, but at least when trying to debug issues and
understand the history of this branch there will be a publicly available
change history to reference.

To clarify my position a bit here - I definitely appreciate your
volunteering to do the work, and wouldn't *block* the proposal as you've put
it forth. I just think it will have limited utility for the community by
being opaque (if contributed as a giant patch) and by not including the sync
feature which is critical for a large segment of users. Given those
downsides I'd rather see the effort diverted towards making a killer 0.22
release that we can all jump on.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Keith is on vacation

2011-01-13 Thread Qi BJ Chen

I will be out of the office starting  2011-01-14 and will not return until
2011-01-22.

During this time, I won't access my email.

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy


On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote:

On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy   
wrote:


Since this could be applied as a linear set of patches instead of a  
big

lump, would there be interest in using this as the 0.20.>100 Apache
release?
I can take the time to remove any patches that are cloudera  
specific or

not
yet applied upstream.



Interesting discussion, thanks.

I'm sure it took you a fair amount of work to squash patches (which  
I tried

too, btw).



Yep, I had a great summer ;-)


That, plus the fact that we would need to do a similar amount of  
work for

the 10 or so releases we have done after 0.20.100.3 scares me.



Sorry, I actually meant 0.20.104.3. Have there been many releases  
since
then? That's the last version available on the Yahoo github, and  
that's the

version we incorporated/linearized.


Yep. I had a great summer! ;-)



As we Nigel and I discussed here, the jumbo  patch and an up-to-date
CHANGES.txt provides almost all of the benefits we seek and allows  
all of us

to get this done very quickly to focus on hadoop-0.22 and beyond.



In my opinion here are the downsides to this plan:



I agree there are downsides, I think I did point them out at the  
outset! :)


- a mondo "merge" patch is a big pain when trying to do debugging.  
It may be

sufficient for a user to look at CHANGES.txt, but I find myself using
blame/log/etc on individual files to understand code lineage on a  
daily
basis. If all of the merge shows up as a big patch it will be very  
difficult
(at least the way I work with code) to help users debug issues or  
understand

which JIRA a certain regression may have come from.



Right, no question. Which is why I offered to do this as a background  
activity right after... this ensures that the source of truth is  
*always* a branch in Apache subversion.


I feel that we could get a usable release out of door quickly for our  
users. Also, please remember that almost every patch we have committed  
is available on relevant jiras. I understand the devs have a problem  
and I feel we can bear with it for a little while. Again, I agree this  
isn't an ideal solution, I'm just trying to expedite the release for  
the users.




To clarify my position a bit here - I definitely appreciate your
volunteering to do the work, and wouldn't *block* the proposal as  
you've put
it forth. I just think it will have limited utility for the  
community by
being opaque (if contributed as a giant patch) and by not including  
the sync

feature which is critical for a large segment of users. Given those
downsides I'd rather see the effort diverted towards making a killer  
0.22

release that we can all jump on.



Thanks for understanding.

Again, I completely agree this isn't an ideal situation, but I do hope  
it has a bit more than *limited utility* for our end-users. Who knows,  
I maybe hopelessly deluded! *smile*


Also, I'm trying to do exactly what you suggested - spend very little  
time on this so that everyone, including me, can focus on the future.


thanks,
Arun




Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Eli Collins
On Thursday, January 13, 2011, Arun C Murthy  wrote:
>
> On Jan 13, 2011, at 3:34 PM, Todd Lipcon wrote:
>
>
> On Thu, Jan 13, 2011 at 3:05 PM, Arun C Murthy  wrote:
>
>
> Since this could be applied as a linear set of patches instead of a big
>
> lump, would there be interest in using this as the 0.20.>100 Apache
> release?
> I can take the time to remove any patches that are cloudera specific or
> not
> yet applied upstream.
>
>
>
> Interesting discussion, thanks.
>
> I'm sure it took you a fair amount of work to squash patches (which I tried
> too, btw).
>
>
>
> Yep, I had a great summer ;-)
>
>
>
> That, plus the fact that we would need to do a similar amount of work for
> the 10 or so releases we have done after 0.20.100.3 scares me.
>
>
>
> Sorry, I actually meant 0.20.104.3. Have there been many releases since
> then? That's the last version available on the Yahoo github, and that's the
> version we incorporated/linearized.
>
>
> Yep. I had a great summer! ;-)
>
>
>
> As we Nigel and I discussed here, the jumbo  patch and an up-to-date
> CHANGES.txt provides almost all of the benefits we seek and allows all of us
> to get this done very quickly to focus on hadoop-0.22 and beyond.
>
>
>
> In my opinion here are the downsides to this plan:
>
>
>
> I agree there are downsides, I think I did point them out at the outset! :)
>
>
> - a mondo "merge" patch is a big pain when trying to do debugging. It may be
> sufficient for a user to look at CHANGES.txt, but I find myself using
> blame/log/etc on individual files to understand code lineage on a daily
> basis. If all of the merge shows up as a big patch it will be very difficult
> (at least the way I work with code) to help users debug issues or understand
> which JIRA a certain regression may have come from.
>
>
>
> Right, no question. Which is why I offered to do this as a background 
> activity right after... this ensures that the source of truth is *always* a 
> branch in Apache subversion.
>
> I feel that we could get a usable release out of door quickly for our users. 
> Also, please remember that almost every patch we have committed is available 
> on relevant jiras. I understand the devs have a problem and I feel we can 
> bear with it for a little while. Again, I agree this isn't an ideal solution, 
> I'm just trying to expedite the release for the users.
>
>
>
> To clarify my position a bit here - I definitely appreciate your
> volunteering to do the work, and wouldn't *block* the proposal as you've put
> it forth. I just think it will have limited utility for the community by
> being opaque (if contributed as a giant patch) and by not including the sync
> feature which is critical for a large segment of users. Given those
> downsides I'd rather see the effort diverted towards making a killer 0.22
> release that we can all jump on.
>
>
>
> Thanks for understanding.
>
> Again, I completely agree this isn't an ideal situation, but I do hope it has 
> a bit more than *limited utility* for our end-users. Who knows, I maybe 
> hopelessly deluded! *smile*
>
> Also, I'm trying to do exactly what you suggested - spend very little time on 
> this so that everyone, including me, can focus on the future.
>
> thanks,
> Arun
>

Given that Todd has already done the work to rebase the 0.20.104.3
patch set on 0.20.2, and in a way that doesn't require one big change,
and his patch set includes branch20-append which the HBase guys want
an Apache release of wouldn't it make sense to go this route?  What do
others think? Seems better to have one 0.20.100 release than multiple
ones for security and append.

Thanks,
Eli


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy


On Jan 13, 2011, at 5:35 PM, Eli Collins wrote:

Given that Todd has already done the work to rebase the 0.20.104.3
patch set on 0.20.2, and in a way that doesn't require one big change,
and his patch set includes branch20-append which the HBase guys want
an Apache release of wouldn't it make sense to go this route?  What do
others think? Seems better to have one 0.20.100 release than multiple
ones for security and append.



My concern around 0.20.104.3 is that it has serious security holes  
including a root exploit that we have since fixed. I'm sure you guys  
are aware of them, Todd has helped to fix some.


The version I'm offering to push to the community has fixed all of  
them, *plus* the added benefit of several stability and performance  
fixes we have done since 20.104.3, almost 10 internal releases. This  
is a battle tested and hardened version which we have deployed on  
40,000+ nodes. It is a significant upgrade on 0.20.104.3 which we  
never deployed. I'm pretty sure *some* users will find that valuable. ;)


Also, I've offered to push individual patches as a background activity  
on a branch - that should suffice, no? Or, do you consider this a  
blocker?


Again, my goal in this exercise is to get a stable, improved version  
of Hadoop into the hands of our users asap, and focus on 0.22 and  
beyond.


thanks,
Arun


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Eli Collins
On Thu, Jan 13, 2011 at 6:12 PM, Arun C Murthy  wrote:
>
> On Jan 13, 2011, at 5:35 PM, Eli Collins wrote:
>>
>> Given that Todd has already done the work to rebase the 0.20.104.3
>> patch set on 0.20.2, and in a way that doesn't require one big change,
>> and his patch set includes branch20-append which the HBase guys want
>> an Apache release of wouldn't it make sense to go this route?  What do
>> others think? Seems better to have one 0.20.100 release than multiple
>> ones for security and append.
>
>
> My concern around 0.20.104.3 is that it has serious security holes including
> a root exploit that we have since fixed. I'm sure you guys are aware of
> them, Todd has helped to fix some.
>

The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
performance and stability fixes I think you're referring to, at least
the ones that have been posted to Apache jira).

Can you post a pointer to the version you're referring to, eg on
github?  If there isn't a big delta between it and the cdh3 patch set
(which should have the 20-based patches from jira) perhaps you and
Todd could easily merge in the delta to create 0.20.x?

> The version I'm offering to push to the community has fixed all of them,
> *plus* the added benefit of several stability and performance fixes we have
> done since 20.104.3, almost 10 internal releases. This is a battle tested
> and hardened version which we have deployed on 40,000+ nodes. It is a
> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
> *some* users will find that valuable. ;)

Definitely, but better to hit two birds with one stone right?  Instead
of a security + enhancements release and an append release we could
have a single security + append + enhancements release and users don't
have to choose.

> Also, I've offered to push individual patches as a background activity on a
> branch - that should suffice, no? Or, do you consider this a blocker?

Definitely not a blocker.

> Again, my goal in this exercise is to get a stable, improved version of
> Hadoop into the hands of our users asap, and focus on 0.22 and beyond.

Agree, that's everyone's goal.  My point is that a release that's
already been re-based on 20.2, doesn't require a separate HBase
release, and doesn't require you spend time on a background task to
break up the big change into smaller ones seems like a faster way
forward.

Thanks,
Eli


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy


On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:


The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
performance and stability fixes I think you're referring to, at least
the ones that have been posted to Apache jira).

Can you post a pointer to the version you're referring to, eg on
github?  If there isn't a big delta between it and the cdh3 patch set
(which should have the 20-based patches from jira) perhaps you and
Todd could easily merge in the delta to create 0.20.x?



I can guarantee it will need work to merge the enhancements since  
20.104.3, it's over 6 months of development. The enhancements includes  
work on stability such as iterative ls, limits on JT to prevent single  
jobs/users from taking it down etc. and lots of bug-fixes to security.  
So, unfortunately the delta is pretty large.


I'm working on a CHANGES.txt which should reflect all the changes i.e.  
bug-fixes and enhancements.


The version I'm offering to push to the community has fixed all of  
them,
*plus* the added benefit of several stability and performance fixes  
we have
done since 20.104.3, almost 10 internal releases. This is a battle  
tested

and hardened version which we have deployed on 40,000+ nodes. It is a
significant upgrade on 0.20.104.3 which we never deployed. I'm  
pretty sure

*some* users will find that valuable. ;)


Definitely, but better to hit two birds with one stone right?  Instead
of a security + enhancements release and an append release we could
have a single security + append + enhancements release and users don't
have to choose.




We are discussing two options:
20 + security + enhancements
20 + security + append

I think the value we provide via 20+security+enhancements release is  
that it's stable, tested and deployed at scale. Doing any more work  
merging 6 months of work at Yahoo (again, I guarantee it's a lot of  
work) will need a lots of cycles to validate, test and stabilize.


I feel the alternative is a distraction for me, I'd rather work on 0.22.

I can get 20+security+enhancements done very, very, quickly precisely  
because I don't have to spend cycles testing it.


Does that make sense? Thanks for being patient and bearing with me...

Arun



Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Tsz Wo (Nicholas), Sze
Below are copied from http://httpd.apache.org/dev/release.html.  Not sure if it 
helps.

What power does the RM yield?
Regarding what makes it into a release, the RM is the unquestioned authority. 
No 
one can contest what makes it into the release. The community will judge the 
release's quality after it has been issued, but the community can not force the 
RM to include a feature that they feel uncomfortable adding. Remember that this 
document is only a guideline to the community and future RMs - each RM may run 
a 
release in a different way. If you don't like what an RM is doing, start 
preparing for your own competing release.

Nicholas


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Eric Baldeschwieler
Hi Eli,

Thanks for the suggestion.

+1 to nigel and arun's proposal.

I completely support the idea of creating a version of 20 with append for 
HBASE.  However, the append issue is very complicated and there does not exist 
any version of append that is certified against a workload as diverse as what 
this branch has been tested against.  I think you are trying to cross too many 
streams here.   If you have resources to help integrate any version of Hadoop 
0.20 with append, package and test it, I fully support you doing so.  But that 
effort is not aligned with the goal of this branch, which is to share a 
substantial amount of fully integrated and tested work.  Members of the 
community have expressed interest in seeing this tested work get checked into 
Apache and I would like to share it.  Mashing it up with other patches would 
invalidate months of testing, defeating the purpose of the exercise.

If you are interested in integrating Append with this branch, why not create a 
20.200 branch and do so?

Unless you are vetoing the sharing of work as is on a branch (the purpose of 
the branch), I suggest we move on.

Thanks,

E14


On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

> 
> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
> 
>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
>> performance and stability fixes I think you're referring to, at least
>> the ones that have been posted to Apache jira).
>> 
>> Can you post a pointer to the version you're referring to, eg on
>> github?  If there isn't a big delta between it and the cdh3 patch set
>> (which should have the 20-based patches from jira) perhaps you and
>> Todd could easily merge in the delta to create 0.20.x?
>> 
> 
> I can guarantee it will need work to merge the enhancements since  
> 20.104.3, it's over 6 months of development. The enhancements includes  
> work on stability such as iterative ls, limits on JT to prevent single  
> jobs/users from taking it down etc. and lots of bug-fixes to security.  
> So, unfortunately the delta is pretty large.
> 
> I'm working on a CHANGES.txt which should reflect all the changes i.e.  
> bug-fixes and enhancements.
> 
>>> The version I'm offering to push to the community has fixed all of  
>>> them,
>>> *plus* the added benefit of several stability and performance fixes  
>>> we have
>>> done since 20.104.3, almost 10 internal releases. This is a battle  
>>> tested
>>> and hardened version which we have deployed on 40,000+ nodes. It is a
>>> significant upgrade on 0.20.104.3 which we never deployed. I'm  
>>> pretty sure
>>> *some* users will find that valuable. ;)
>> 
>> Definitely, but better to hit two birds with one stone right?  Instead
>> of a security + enhancements release and an append release we could
>> have a single security + append + enhancements release and users don't
>> have to choose.
>> 
> 
> 
> We are discussing two options:
> 20 + security + enhancements
> 20 + security + append
> 
> I think the value we provide via 20+security+enhancements release is  
> that it's stable, tested and deployed at scale. Doing any more work  
> merging 6 months of work at Yahoo (again, I guarantee it's a lot of  
> work) will need a lots of cycles to validate, test and stabilize.
> 
> I feel the alternative is a distraction for me, I'd rather work on 0.22.
> 
> I can get 20+security+enhancements done very, very, quickly precisely  
> because I don't have to spend cycles testing it.
> 
> Does that make sense? Thanks for being patient and bearing with me...
> 
> Arun
> 



Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Nigel Daley
I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect, but 
it's good enough.

Let's move on to 0.22 and beyond.

Nige

On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

> 
> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
> 
>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
>> performance and stability fixes I think you're referring to, at least
>> the ones that have been posted to Apache jira).
>> 
>> Can you post a pointer to the version you're referring to, eg on
>> github?  If there isn't a big delta between it and the cdh3 patch set
>> (which should have the 20-based patches from jira) perhaps you and
>> Todd could easily merge in the delta to create 0.20.x?
>> 
> 
> I can guarantee it will need work to merge the enhancements since 20.104.3, 
> it's over 6 months of development. The enhancements includes work on 
> stability such as iterative ls, limits on JT to prevent single jobs/users 
> from taking it down etc. and lots of bug-fixes to security. So, unfortunately 
> the delta is pretty large.
> 
> I'm working on a CHANGES.txt which should reflect all the changes i.e. 
> bug-fixes and enhancements.
> 
>>> The version I'm offering to push to the community has fixed all of them,
>>> *plus* the added benefit of several stability and performance fixes we have
>>> done since 20.104.3, almost 10 internal releases. This is a battle tested
>>> and hardened version which we have deployed on 40,000+ nodes. It is a
>>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
>>> *some* users will find that valuable. ;)
>> 
>> Definitely, but better to hit two birds with one stone right?  Instead
>> of a security + enhancements release and an append release we could
>> have a single security + append + enhancements release and users don't
>> have to choose.
>> 
> 
> 
> We are discussing two options:
> 20 + security + enhancements
> 20 + security + append
> 
> I think the value we provide via 20+security+enhancements release is that 
> it's stable, tested and deployed at scale. Doing any more work merging 6 
> months of work at Yahoo (again, I guarantee it's a lot of work) will need a 
> lots of cycles to validate, test and stabilize.
> 
> I feel the alternative is a distraction for me, I'd rather work on 0.22.
> 
> I can get 20+security+enhancements done very, very, quickly precisely because 
> I don't have to spend cycles testing it.
> 
> Does that make sense? Thanks for being patient and bearing with me...
> 
> Arun
> 



[DISCUSS] Move project split down a level

2011-01-13 Thread Nigel Daley
Folks,

As I look more at the impact of the common/MR/HDFS project split on what and 
how we release Hadoop, I feel like the split needs an adjustment.  Many folks 
I've talked to agree that the project split has caused us a splitting headache. 
 I think 1 relatively small change could alleviate some of that.

CURRENT SVN REPO:

hadoop / [common, mapreduce, hdfs] / trunk
hadoop / [common, mapreduce, hdfs] / branches

PROPOSAL:

hadoop / trunk / [common, mapreduce, hdfs]
hadoop / branches / [common, mapreduce, hdfs]

We're a long way from releasing these 3 projects independently.  Given that, 
they should be branched and released as a unit.  This SVN structure enforces 
that and provides a more natural place to keep a top level build and pkg 
scripts that operate across all 3 projects.  

Thoughts?

Cheers,
Nige


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy
*nod* Ok.

Arun

On Jan 13, 2011, at 10:08 PM, "Nigel Daley"  wrote:

> I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect, but 
> it's good enough.
> 
> Let's move on to 0.22 and beyond.
> 
> Nige
> 
> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:
> 
>> 
>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
>> 
>>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
>>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
>>> performance and stability fixes I think you're referring to, at least
>>> the ones that have been posted to Apache jira).
>>> 
>>> Can you post a pointer to the version you're referring to, eg on
>>> github?  If there isn't a big delta between it and the cdh3 patch set
>>> (which should have the 20-based patches from jira) perhaps you and
>>> Todd could easily merge in the delta to create 0.20.x?
>>> 
>> 
>> I can guarantee it will need work to merge the enhancements since 20.104.3, 
>> it's over 6 months of development. The enhancements includes work on 
>> stability such as iterative ls, limits on JT to prevent single jobs/users 
>> from taking it down etc. and lots of bug-fixes to security. So, 
>> unfortunately the delta is pretty large.
>> 
>> I'm working on a CHANGES.txt which should reflect all the changes i.e. 
>> bug-fixes and enhancements.
>> 
 The version I'm offering to push to the community has fixed all of them,
 *plus* the added benefit of several stability and performance fixes we have
 done since 20.104.3, almost 10 internal releases. This is a battle tested
 and hardened version which we have deployed on 40,000+ nodes. It is a
 significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
 *some* users will find that valuable. ;)
>>> 
>>> Definitely, but better to hit two birds with one stone right?  Instead
>>> of a security + enhancements release and an append release we could
>>> have a single security + append + enhancements release and users don't
>>> have to choose.
>>> 
>> 
>> 
>> We are discussing two options:
>> 20 + security + enhancements
>> 20 + security + append
>> 
>> I think the value we provide via 20+security+enhancements release is that 
>> it's stable, tested and deployed at scale. Doing any more work merging 6 
>> months of work at Yahoo (again, I guarantee it's a lot of work) will need a 
>> lots of cycles to validate, test and stabilize.
>> 
>> I feel the alternative is a distraction for me, I'd rather work on 0.22.
>> 
>> I can get 20+security+enhancements done very, very, quickly precisely 
>> because I don't have to spend cycles testing it.
>> 
>> Does that make sense? Thanks for being patient and bearing with me...
>> 
>> Arun
>> 
> 


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Eli Collins
Sorry for rattling you guys, definitely wasn't discussing a veto.  I'm
absolutely not opposed, just thought the alternative Todd raised was
worth a couple emails since users have requested both security and
append, and such a branch that includes both of those plus
enhancements and substantial testing exists.

Arun - I appreciate all the info, looking forward to the release.

Thanks,
Eli

On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy  wrote:
> *nod* Ok.
>
> Arun
>
> On Jan 13, 2011, at 10:08 PM, "Nigel Daley"  wrote:
>
>> I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect, but 
>> it's good enough.
>>
>> Let's move on to 0.22 and beyond.
>>
>> Nige
>>
>> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:
>>
>>>
>>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
>>>
 The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
 performance and stability fixes I think you're referring to, at least
 the ones that have been posted to Apache jira).

 Can you post a pointer to the version you're referring to, eg on
 github?  If there isn't a big delta between it and the cdh3 patch set
 (which should have the 20-based patches from jira) perhaps you and
 Todd could easily merge in the delta to create 0.20.x?

>>>
>>> I can guarantee it will need work to merge the enhancements since 20.104.3, 
>>> it's over 6 months of development. The enhancements includes work on 
>>> stability such as iterative ls, limits on JT to prevent single jobs/users 
>>> from taking it down etc. and lots of bug-fixes to security. So, 
>>> unfortunately the delta is pretty large.
>>>
>>> I'm working on a CHANGES.txt which should reflect all the changes i.e. 
>>> bug-fixes and enhancements.
>>>
> The version I'm offering to push to the community has fixed all of them,
> *plus* the added benefit of several stability and performance fixes we 
> have
> done since 20.104.3, almost 10 internal releases. This is a battle tested
> and hardened version which we have deployed on 40,000+ nodes. It is a
> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty sure
> *some* users will find that valuable. ;)

 Definitely, but better to hit two birds with one stone right?  Instead
 of a security + enhancements release and an append release we could
 have a single security + append + enhancements release and users don't
 have to choose.

>>>
>>>
>>> We are discussing two options:
>>> 20 + security + enhancements
>>> 20 + security + append
>>>
>>> I think the value we provide via 20+security+enhancements release is that 
>>> it's stable, tested and deployed at scale. Doing any more work merging 6 
>>> months of work at Yahoo (again, I guarantee it's a lot of work) will need a 
>>> lots of cycles to validate, test and stabilize.
>>>
>>> I feel the alternative is a distraction for me, I'd rather work on 0.22.
>>>
>>> I can get 20+security+enhancements done very, very, quickly precisely 
>>> because I don't have to spend cycles testing it.
>>>
>>> Does that make sense? Thanks for being patient and bearing with me...
>>>
>>> Arun
>>>
>>
>


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Todd Lipcon
On Thu, Jan 13, 2011 at 10:29 PM, Eli Collins  wrote:

> Sorry for rattling you guys, definitely wasn't discussing a veto.  I'm
> absolutely not opposed, just thought the alternative Todd raised was
> worth a couple emails since users have requested both security and
> append, and such a branch that includes both of those plus
> enhancements and substantial testing exists.
>
> Arun - I appreciate all the info, looking forward to the release.
>
>
Same here.

Back to the patch queue for me! 0.22 here we come.

-Todd


>  On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy 
> wrote:
> > *nod* Ok.
> >
> > Arun
> >
> > On Jan 13, 2011, at 10:08 PM, "Nigel Daley"  wrote:
> >
> >> I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect,
> but it's good enough.
> >>
> >> Let's move on to 0.22 and beyond.
> >>
> >> Nige
> >>
> >> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:
> >>
> >>>
> >>> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
> >>>
>  The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
>  104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
>  performance and stability fixes I think you're referring to, at least
>  the ones that have been posted to Apache jira).
> 
>  Can you post a pointer to the version you're referring to, eg on
>  github?  If there isn't a big delta between it and the cdh3 patch set
>  (which should have the 20-based patches from jira) perhaps you and
>  Todd could easily merge in the delta to create 0.20.x?
> 
> >>>
> >>> I can guarantee it will need work to merge the enhancements since
> 20.104.3, it's over 6 months of development. The enhancements includes work
> on stability such as iterative ls, limits on JT to prevent single jobs/users
> from taking it down etc. and lots of bug-fixes to security. So,
> unfortunately the delta is pretty large.
> >>>
> >>> I'm working on a CHANGES.txt which should reflect all the changes i.e.
> bug-fixes and enhancements.
> >>>
> > The version I'm offering to push to the community has fixed all of
> them,
> > *plus* the added benefit of several stability and performance fixes
> we have
> > done since 20.104.3, almost 10 internal releases. This is a battle
> tested
> > and hardened version which we have deployed on 40,000+ nodes. It is a
> > significant upgrade on 0.20.104.3 which we never deployed. I'm pretty
> sure
> > *some* users will find that valuable. ;)
> 
>  Definitely, but better to hit two birds with one stone right?  Instead
>  of a security + enhancements release and an append release we could
>  have a single security + append + enhancements release and users don't
>  have to choose.
> 
> >>>
> >>>
> >>> We are discussing two options:
> >>> 20 + security + enhancements
> >>> 20 + security + append
> >>>
> >>> I think the value we provide via 20+security+enhancements release is
> that it's stable, tested and deployed at scale. Doing any more work merging
> 6 months of work at Yahoo (again, I guarantee it's a lot of work) will need
> a lots of cycles to validate, test and stabilize.
> >>>
> >>> I feel the alternative is a distraction for me, I'd rather work on
> 0.22.
> >>>
> >>> I can get 20+security+enhancements done very, very, quickly precisely
> because I don't have to spend cycles testing it.
> >>>
> >>> Does that make sense? Thanks for being patient and bearing with me...
> >>>
> >>> Arun
> >>>
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy
No worries. Thanks to both Eli & Todd for the discussion. 

I look forward to getting this done and moving ahead to 0.22 and beyond.

thanks,
Arun

On Jan 13, 2011, at 10:29 PM, "Eli Collins"  wrote:

> Sorry for rattling you guys, definitely wasn't discussing a veto.  I'm
> absolutely not opposed, just thought the alternative Todd raised was
> worth a couple emails since users have requested both security and
> append, and such a branch that includes both of those plus
> enhancements and substantial testing exists.
> 
> Arun - I appreciate all the info, looking forward to the release.
> 
> Thanks,
> Eli
> 
> On Thu, Jan 13, 2011 at 10:21 PM, Arun C Murthy  wrote:
>> *nod* Ok.
>> 
>> Arun
>> 
>> On Jan 13, 2011, at 10:08 PM, "Nigel Daley"  wrote:
>> 
>>> I say just do it.  Eli said it wasn't a blocker. Sure it ain't perfect, but 
>>> it's good enough.
>>> 
>>> Let's move on to 0.22 and beyond.
>>> 
>>> Nige
>>> 
>>> On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:
>>> 
 
 On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
 
> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
> performance and stability fixes I think you're referring to, at least
> the ones that have been posted to Apache jira).
> 
> Can you post a pointer to the version you're referring to, eg on
> github?  If there isn't a big delta between it and the cdh3 patch set
> (which should have the 20-based patches from jira) perhaps you and
> Todd could easily merge in the delta to create 0.20.x?
> 
 
 I can guarantee it will need work to merge the enhancements since 
 20.104.3, it's over 6 months of development. The enhancements includes 
 work on stability such as iterative ls, limits on JT to prevent single 
 jobs/users from taking it down etc. and lots of bug-fixes to security. So, 
 unfortunately the delta is pretty large.
 
 I'm working on a CHANGES.txt which should reflect all the changes i.e. 
 bug-fixes and enhancements.
 
>> The version I'm offering to push to the community has fixed all of them,
>> *plus* the added benefit of several stability and performance fixes we 
>> have
>> done since 20.104.3, almost 10 internal releases. This is a battle tested
>> and hardened version which we have deployed on 40,000+ nodes. It is a
>> significant upgrade on 0.20.104.3 which we never deployed. I'm pretty 
>> sure
>> *some* users will find that valuable. ;)
> 
> Definitely, but better to hit two birds with one stone right?  Instead
> of a security + enhancements release and an append release we could
> have a single security + append + enhancements release and users don't
> have to choose.
> 
 
 
 We are discussing two options:
 20 + security + enhancements
 20 + security + append
 
 I think the value we provide via 20+security+enhancements release is that 
 it's stable, tested and deployed at scale. Doing any more work merging 6 
 months of work at Yahoo (again, I guarantee it's a lot of work) will need 
 a lots of cycles to validate, test and stabilize.
 
 I feel the alternative is a distraction for me, I'd rather work on 0.22.
 
 I can get 20+security+enhancements done very, very, quickly precisely 
 because I don't have to spend cycles testing it.
 
 Does that make sense? Thanks for being patient and bearing with me...
 
 Arun
 
>>> 
>> 


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Stack
(Man, it was looking good there for a second when 0.20.100 was about
security+append!)

Good luck w/ the release Arun.

We might be following your 0.20.100 with a 0.20.200 append.

St.Ack


Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Eric Baldeschwieler
I'd love to see that!

On Jan 13, 2011, at 10:59 PM, Stack wrote:

> (Man, it was looking good there for a second when 0.20.100 was about
> security+append!)
> 
> Good luck w/ the release Arun.
> 
> We might be following your 0.20.100 with a 0.20.200 append.
> 
> St.Ack



Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-13 Thread Arun C Murthy


On Jan 13, 2011, at 10:59 PM, Stack wrote:


(Man, it was looking good there for a second when 0.20.100 was about
security+append!)

Good luck w/ the release Arun.



Thanks!


We might be following your 0.20.100 with a 0.20.200 append.



Super!

Arun


Re: [DISCUSS] Move project split down a level

2011-01-13 Thread Eric Baldeschwieler
+1

Death to the project split!  Or short of that, anything to tame it.

On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote:

> Folks,
> 
> As I look more at the impact of the common/MR/HDFS project split on what and 
> how we release Hadoop, I feel like the split needs an adjustment.  Many folks 
> I've talked to agree that the project split has caused us a splitting 
> headache.  I think 1 relatively small change could alleviate some of that.
> 
> CURRENT SVN REPO:
> 
> hadoop / [common, mapreduce, hdfs] / trunk
> hadoop / [common, mapreduce, hdfs] / branches
> 
> PROPOSAL:
> 
> hadoop / trunk / [common, mapreduce, hdfs]
> hadoop / branches / [common, mapreduce, hdfs]
> 
> We're a long way from releasing these 3 projects independently.  Given that, 
> they should be branched and released as a unit.  This SVN structure enforces 
> that and provides a more natural place to keep a top level build and pkg 
> scripts that operate across all 3 projects.  
> 
> Thoughts?
> 
> Cheers,
> Nige



Re: [DISCUSS] Move project split down a level

2011-01-13 Thread Todd Lipcon
Big +1.

Curious how this will map to git, though - do we go back to one git repo?

When we have a patch that is mainly HDFS or MR focused but will need changes
across projects, can we just put up one patch in HDFS/MR or do we still need
to open a parallel common JIRA?

On Thu, Jan 13, 2011 at 11:25 PM, Eric Baldeschwieler
wrote:

> +1
>
> Death to the project split!  Or short of that, anything to tame it.
>
> On Jan 13, 2011, at 10:18 PM, Nigel Daley wrote:
>
> > Folks,
> >
> > As I look more at the impact of the common/MR/HDFS project split on what
> and how we release Hadoop, I feel like the split needs an adjustment.  Many
> folks I've talked to agree that the project split has caused us a splitting
> headache.  I think 1 relatively small change could alleviate some of that.
> >
> > CURRENT SVN REPO:
> >
> > hadoop / [common, mapreduce, hdfs] / trunk
> > hadoop / [common, mapreduce, hdfs] / branches
> >
> > PROPOSAL:
> >
> > hadoop / trunk / [common, mapreduce, hdfs]
> > hadoop / branches / [common, mapreduce, hdfs]
> >
> > We're a long way from releasing these 3 projects independently.  Given
> that, they should be branched and released as a unit.  This SVN structure
> enforces that and provides a more natural place to keep a top level build
> and pkg scripts that operate across all 3 projects.
> >
> > Thoughts?
> >
> > Cheers,
> > Nige
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera