Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy

Doug,

On May 2, 2011, at 10:58 AM, Doug Cutting wrote:

The patch selection process for this branch did not appear to be a
community process.  A massive patch set was committed en-masse with no
public discussion before or after about its specific composition.


Lets review:

# You proposed to release off the Yahoo security patchset first in  
April, 2010: http://s.apache.org/5Gv

# I started this discussion again in Jan, 2011: http://s.apache.org/uf
# We went through several iterations:
 - I first committed a jumbo patch upon which some reservations were  
expressed.
 - Owen went ahead and broke them up to commit individual patches to  
incorporate the provided feedback.
# Roy clearly clarified the way forward: http://s.apache.org/tD4  
(which Owen has since incorporatedk by breaking into individual  
patches).


Your current stance given the history, is surprising, to say the  
least... we have already discussed this. It is clear that the  
community (including downstream Apache projects like Pig, Hive and  
HCatalog) will substantially benefit from an Apache release of this  
improved codebase.


thanks,
Arun



Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 01:07 PM, Arun C Murthy wrote:
> # Roy clearly clarified the way forward: http://s.apache.org/tD4 (which
> Owen has since incorporated by breaking into individual patches).

Roy suggested a three ways forward and possible outcomes:

Roy Fielding wrote:
>  a) break the changes down into a sequence of patches, create jira
> issues for each one (or append to the existing issue), and then
> provide the group with a list of the issue links so that people
> can quickly +1 each one.  When it seems worthwhile to you, create
> a branch off of some prior Apache release point in svn and commit
> each patch to it until the branch is identical to (or, in your own
> opinion, better than) the source code that you have tested locally.
> Then RM a tarball and start a release vote.  Since all of this is
> being done in jira and svn, others can help you do all but the
> first part (breaking down the big patch).
> 
> or
> 
>  b) create a branch off of some prior Apache release point in svn
> and replay the internal Y! commits on that branch until the branch
> source code is identical to what you have tested locally.  Then
> RM a tarball based on that branch and start a release vote.
> Since the history is now in svn, others could do the RM bit if
> you don't have time.
> 
> or
> 
>  c) create a branch off of some prior Apache release point in svn
> and apply one big ugly patch to it.  Then RM a tarball based
> on that branch and ask for a release vote.
> 
> You will note that none of the above requires a discussion on this
> list prior to the release vote, though (a) would likely result in
> more +1s than (b), and (b) would likely receive more +1s than (c).
> Regardless, the release vote is a lazy majority decision.
>
>  [ ... ]
>
> When the release vote happens, encourage folks to test and +1
> the release.  If it passes, woohoo!  If not, then listen to the
> reasons given by the other PMC members and see if you can make
> enough changes to the release to get those extra +1s.

I believe that Owen chose (b).  We're now at the release vote and I am a
PMC member giving reasons for my vote.

Also note that, on the common-dev thread, Eli & Tom have both noted a
number of inconsistencies between this set of patches and trunk, 0.22
and even prior 0.20 branches and releases.  In addition to the lack of
community involvement in patch selection, these issues concern me.

I cannot in good conscience vote for this release as a community product.

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy

Doug,

On May 2, 2011, at 1:40 PM, Doug Cutting wrote:


Also note that, on the common-dev thread, Eli & Tom have both noted a
number of inconsistencies between this set of patches and trunk, 0.22
and even prior 0.20 branches and releases.  In addition to the lack of
community involvement in patch selection, these issues concern me.

I cannot in good conscience vote for this release as a community  
product.


As I noted before you were the first one to propose this release off  
Yahoo security patch-set in April, 2010:

http://s.apache.org/5Gv

What has changed since? Clearly, the same situation exists today.

Also, please note that of the ~450 commits in the branch, only 30 odd  
jiras are yet to be committed to trunk:
http://s.apache.org/7Pe. So it's incorrect to state 'lack of community  
involvement'.


Assuming the technical inconsistencies are sorted out, are you willing  
to withdraw you objection?


thanks,
Arun



Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 02:05 PM, Arun C Murthy wrote:
> As I noted before you were the first one to propose this release off
> Yahoo security patch-set in April, 2010:
> http://s.apache.org/5Gv
> 
> What has changed since? Clearly, the same situation exists today.

I have absolutely no objection in principle to an Apache 0.20 release
including security.  I object to the fact that this patchset started
from an arbitrary point and unilaterally applied a large set of patches
that are not well correlated with Jira, trunk or other 0.20 branches.

> Also, please note that of the ~450 commits in the branch, only 30 odd
> jiras are yet to be committed to trunk:
> http://s.apache.org/7Pe. So it's incorrect to state 'lack of community
> involvement'.

This should be easily discoverable from Jira: issues should use the
"fix-for" field to indicate which branches they've been merged to.  This
standard practice has not been observed for over 400 patches included in
this release candidate.

> Assuming the technical inconsistencies are sorted out, are you willing
> to withdraw you objection?

These are not just technical concerns.  How I vote on any future release
candidate will in part depend on how the community is involved in its
production.

Thanks,

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy


On May 2, 2011, at 2:21 PM, Doug Cutting wrote:


On 05/02/2011 02:05 PM, Arun C Murthy wrote:

As I noted before you were the first one to propose this release off
Yahoo security patch-set in April, 2010:
http://s.apache.org/5Gv

What has changed since? Clearly, the same situation exists today.


I have absolutely no objection in principle to an Apache 0.20 release
including security.  I object to the fact that this patchset started
from an arbitrary point and unilaterally applied a large set of  
patches

that are not well correlated with Jira, trunk or other 0.20 branches.


Completely untrue.

This patchset started from 0.20.1 has is complete superset of 0.20.1.

We will work towards ensuring it is a complete superset of the last  
stable release: 0.20.2.





Also, please note that of the ~450 commits in the branch, only 30 odd
jiras are yet to be committed to trunk:
http://s.apache.org/7Pe. So it's incorrect to state 'lack of  
community

involvement'.


This should be easily discoverable from Jira: issues should use the
"fix-for" field to indicate which branches they've been merged to.   
This
standard practice has not been observed for over 400 patches  
included in

this release candidate.



This seems like parliamentary stalling procedures... sure they don't  
have 'fix-for' fields but they've been verified to be true from  
external committers:


http://s.apache.org/yX

Are you simply asking for someone to go through the 450 odd jiras and  
set 'fix-for' fields?


Assuming the technical inconsistencies are sorted out, are you  
willing

to withdraw you objection?


These are not just technical concerns.  How I vote on any future  
release

candidate will in part depend on how the community is involved in its
production.



I understand they aren't technical concerns.

I asked if you were willing to withdraw your objection if the  
technical concerns are satisfied. I think you answered my question -  
you will not withdraw your objection even if it's a technical issue.


thanks,
Arun



Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 02:33 PM, Arun C Murthy wrote:
> On May 2, 2011, at 2:21 PM, Doug Cutting wrote:
>> I have absolutely no objection in principle to an Apache 0.20 release
>> including security.  I object to the fact that this patchset started
>> from an arbitrary point and unilaterally applied a large set of patches
>> that are not well correlated with Jira, trunk or other 0.20 branches.
> 
> Completely untrue.

'Completely'?  Really?  Not a true bit in there?  Wow!

> This patchset started from 0.20.1 has is complete superset of 0.20.1.

0.20.1 isn't a branch, it's a tag.  The 0.20 branch includes many
post-0.20.1 patches that are not in this candidate.  Releases in a
series normally share a branch.

> I asked if you were willing to withdraw your objection if the technical
> concerns are satisfied. I think you answered my question - you will not
> withdraw your objection even if it's a technical issue.

That is not what I said.  If this release does not get enough votes then
perhaps another 0.20.203 release candidate will be proposed.  Its
process and contents will be different and I will judge it on the basis
of those when I vote.

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Ian Holsman

On May 3, 2011, at 7:33 AM, Arun C Murthy wrote:

> 
> This patchset started from 0.20.1 has is complete superset of 0.20.1.
> 
> We will work towards ensuring it is a complete superset of the last stable 
> release: 0.20.2.

so are you intending to make it a superset for 203? or for a future release?

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 02:33 PM, Arun C Murthy wrote:
> We will work towards ensuring it is a complete superset of the last
> stable release: 0.20.2.

Great!  Who's 'we'?  Do you want any help with this?

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Andrew Purtell
Most points in this thread are valid, having to do with the process of how the 
contribution was assembled; and specific technical aspects of it, e.g. JIRAs 
missing from branch 0.20.203 relative to branch 0.20. However,

> From: Doug Cutting 
> > Assuming the technical inconsistencies are sorted out,
> > are you willing to withdraw you objection?
> 
> These are not just technical concerns.  How I vote on any future
> release candidate will in part depend on how the community is
> involved in its production.

What strikes me, as an observer to this discussion, is that here "community" 
does not seem equated with Yahoo by implication. Perhaps I misread. 
Nevertheless, Yahoo retains a good percentage of active Core developers with 
standing as both committers and high scale users, and these people produced the 
contribution that is branch 0.20.203, and therefore by definition "the 
community" was entirely involved in its production.

Yahoo should be commended for advancing the state of branch 0.20 with an 
obvious commitment to donating the results to Apache. As a community we are 
lucky to have a strong contributor. Their security enhancements allow us and 
many others the option of strong authentication and user isolation for 
multitenant deployments. 

A commercial vendor's product already incorporates Yahoo's donated security 
enhancements. It would be regrettable if nontechnical factors ultimately 
prevents Apache from incorporating the value of these contributions into an 
official release.

Some technical concerns seem reasonable. Regarding that:

> From: Stack 
> How hard would it be to get the patches Tom lists below into
> branch-0.20-security-203?  I'd think it'd be an easier
> sell if it were a superset of all in 0.20, especially since it
> bears its name.

This suggestion makes a lot of sense. In addition, filing JIRAs for and posting 
the diffs of the remaining differences could help the process as well, and 
would be good faith actions of an active contributor.

Best regards,

- Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Andrew Purtell
I would like to make one minor but important clarification:

From:

> It would be regrettable if
> nontechnical factors ultimately prevents Apache from
> incorporating the value of these contributions into an
> official release.

To:

It would be regrettable if nontechnical factors ultimately prevents Apache from 
 incorporating the value of these contributions into an official release OF 
0.20. There are some not yet ready to take the leap to 0.22; who do not 
consider it proven.  

So in this regard I do not wish to minimize concerns about distracting from the 
success of 0.22 or later releases. 

> From: Andrew Purtell 
> Subject: Re: [VOTE] Release candidate 0.20.203.0-rc0
> To: general@hadoop.apache.org
> Date: Monday, May 2, 2011, 3:05 PM

> Most points in this thread are valid,
> having to do with the process of how the contribution was
> assembled; and specific technical aspects of it, e.g. JIRAs
> missing from branch 0.20.203 relative to branch 0.20.
> However,
> 
> > > From: Doug Cutting 
> > > Assuming the technical inconsistencies are sorted
> > > out, are you willing to withdraw you objection?
> > 
> > These are not just technical concerns.  How I vote on
> > any future release candidate will in part depend on how
> > the community is involved in its production.
> 
> What strikes me, as an observer to this discussion, is that
> here "community" does not seem equated with Yahoo by
> implication. Perhaps I misread. Nevertheless, Yahoo retains
> a good percentage of active Core developers with standing as
> both committers and high scale users, and these people
> produced the contribution that is branch 0.20.203, and
> therefore by definition "the community" was entirely
> involved in its production.
> 
> Yahoo should be commended for advancing the state of branch
> 0.20 with an obvious commitment to donating the results to
> Apache. As a community we are lucky to have a strong
> contributor. Their security enhancements allow us and many
> others the option of strong authentication and user
> isolation for multitenant deployments. 
> 
> A commercial vendor's product already incorporates Yahoo's
> donated security enhancements. It would be regrettable if
> nontechnical factors ultimately prevents Apache from
> incorporating the value of these contributions into an
> official release.
> 
> Some technical concerns seem reasonable. Regarding that:
> 
> > From: Stack 
> > How hard would it be to get the patches Tom lists
> below into
> > branch-0.20-security-203?  I'd think it'd be an
> easier
> > sell if it were a superset of all in 0.20, especially
> since it
> > bears its name.
> 
> This suggestion makes a lot of sense. In addition, filing
> JIRAs for and posting the diffs of the remaining differences
> could help the process as well, and would be good faith
> actions of an active contributor.
> 
> Best regards,
> 
>     - Andy
> 
> Problems worthy of attack prove their worth by hitting
> back. - Piet Hein (via Tom White)
> 
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy


On May 2, 2011, at 3:05 PM, Andrew Purtell wrote:



Some technical concerns seem reasonable. Regarding that:


From: Stack 
How hard would it be to get the patches Tom lists below into
branch-0.20-security-203?  I'd think it'd be an easier
sell if it were a superset of all in 0.20, especially since it
bears its name.


This suggestion makes a lot of sense. In addition, filing JIRAs for  
and posting the diffs of the remaining differences could help the  
process as well, and would be good faith actions of an active  
contributor.




Agreed, I'm starting the effort to ensure the differences from 0.20.2  
are resolved.


From my msg on this thread to common-dev@:


# Remaining for 0.20.203
 * HADOOP-5611
 * HADOOP-5612
 * HADOOP-5623
 * HDFS-596
 * HDFS-723
 * HDFS-732
 * HDFS-579
 * MAPREDUCE-1070
 * HADOOP-6315
 * MAPREDUCE-1163


Suresh has kindly agreed to help me, appreciate help from others -  
particularly on the 0.20.3 changes.


thanks,
Arun




Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eli Collins
On Mon, May 2, 2011 at 3:15 PM, Arun C Murthy  wrote:
>
> On May 2, 2011, at 3:05 PM, Andrew Purtell wrote:
>
>
>> Some technical concerns seem reasonable. Regarding that:
>>
>>> From: Stack 
>>> How hard would it be to get the patches Tom lists below into
>>> branch-0.20-security-203?  I'd think it'd be an easier
>>> sell if it were a superset of all in 0.20, especially since it
>>> bears its name.
>>
>> This suggestion makes a lot of sense. In addition, filing JIRAs for and
>> posting the diffs of the remaining differences could help the process as
>> well, and would be good faith actions of an active contributor.
>>
>
> Agreed, I'm starting the effort to ensure the differences from 0.20.2 are
> resolved.
>
> From my msg on this thread to common-dev@:
>
>> # Remaining for 0.20.203
>>  * HADOOP-5611
>>  * HADOOP-5612
>>  * HADOOP-5623
>>  * HDFS-596
>>  * HDFS-723
>>  * HDFS-732
>>  * HDFS-579
>>  * MAPREDUCE-1070
>>  * HADOOP-6315
>>  * MAPREDUCE-1163
>
> Suresh has kindly agreed to help me, appreciate help from others -
> particularly on the 0.20.3 changes.
>

I'd like to help.  Perhaps let's coordinate in a separate thread?

Thanks,
Eli


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy


On May 2, 2011, at 2:49 PM, Ian Holsman wrote:



On May 3, 2011, at 7:33 AM, Arun C Murthy wrote:



This patchset started from 0.20.1 has is complete superset of 0.20.1.

We will work towards ensuring it is a complete superset of the last  
stable release: 0.20.2.


so are you intending to make it a superset for 203? or for a future  
release?


For 0.20.203, it behooves us to incorporate feedback for that.

Arun


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 03:05 PM, Andrew Purtell wrote:
> What strikes me, as an observer to this discussion, is that here
> "community" does not seem equated with Yahoo by implication. Perhaps
> I misread. Nevertheless, Yahoo retains a good percentage of active
> Core developers with standing as both committers and high scale
> users, and these people produced the contribution that is branch
> 0.20.203, and therefore by definition "the community" was entirely
> involved in its production.

Whether or not a subset of contributors acts as the community depends on
whether others outside that subset have a reasonable opportunity to
become involved.  Until this release vote was called it wasn't entirely
clear to all what was happening with these branches.  Wider community
involvement is now starting as folks work to rationalize this 450-issue
patch with respect to past and future releases, Jira, etc.

> Yahoo should be commended for advancing the state of branch 0.20 with
> an obvious commitment to donating the results to Apache. As a
> community we are lucky to have a strong contributor. Their security
> enhancements allow us and many others the option of strong
> authentication and user isolation for multitenant deployments.

+1

> A commercial vendor's product already incorporates Yahoo's donated
> security enhancements. It would be regrettable if nontechnical
> factors ultimately prevents Apache from incorporating the value of
> these contributions into an official release.

+1

Cheers,

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Jake Cornelius


Doug Cutting  wrote:


On 05/02/2011 03:05 PM, Andrew Purtell wrote:
> What strikes me, as an observer to this discussion, is that here
> "community" does not seem equated with Yahoo by implication. Perhaps
> I misread. Nevertheless, Yahoo retains a good percentage of active
> Core developers with standing as both committers and high scale
> users, and these people produced the contribution that is branch
> 0.20.203, and therefore by definition "the community" was entirely
> involved in its production.

Whether or not a subset of contributors acts as the community depends on
whether others outside that subset have a reasonable opportunity to
become involved.  Until this release vote was called it wasn't entirely
clear to all what was happening with these branches.  Wider community
involvement is now starting as folks work to rationalize this 450-issue
patch with respect to past and future releases, Jira, etc.

> Yahoo should be commended for advancing the state of branch 0.20 with
> an obvious commitment to donating the results to Apache. As a
> community we are lucky to have a strong contributor. Their security
> enhancements allow us and many others the option of strong
> authentication and user isolation for multitenant deployments.

+1

> A commercial vendor's product already incorporates Yahoo's donated
> security enhancements. It would be regrettable if nontechnical
> factors ultimately prevents Apache from incorporating the value of
> these contributions into an official release.

+1

Cheers,

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Konstantin Shvachko
I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop a
step forward.

Looks like the technical difficulties are resolved now with latest Arun's
commits.
Being a superset of hadoop-0.20.2 it can be considered based on one of the
official Apache releases.
I don't think there was a lack of discussions on the lists about the issues
included in the release candidate. Todd did a thorough review of the entire
security branch. Many developers participated in discussions.
Agreeing with Stack I wish HBase was considered a primary target for Hadoop
support. But it is not realistic to have it in hadoop-0.20.203.
I have some experience running a version of this release candidate on a
large cluster. It works. I would add a couple of patches, which make it run
on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.

Thanks,
--Konstantin


On Mon, May 2, 2011 at 5:12 PM, Ian Holsman  wrote:

>
> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:
>
> >>
> >> Owen, Suresh and I have committed everything on this list except
> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/
> >> necessary, I'll check with Cos.  Other than that hadoop-0.20.203 now a
> >> superset of hadoop-0.20.2.
> >>
> >
> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari
> before committing.
> >
> > Arun
>
> Thanks for doing this so fast Arun.
>
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Eli Collins
I think we still need to incorporate the patches currently checked
into branch 0.20.  For example, Owen identified a major bug
(BooleanWritable's comparator is broken) and filed a jira
(HADOOP-6928) to put it in branch-0.20, where I reviewed it and
checked it in, so this bug would be fixed in the next stable release.
However this change is not in branch-0.20-security-203. Unless we put
the delta from branch-0.20 into this release, it is missing important
bug fixes that will cause it to regress against 20.3 (if it ever is
released).

I am also nervous about changes like the one identified by
HADOOP-7255. It looks like this change caused a significant regression
in TestDFSIO throughput. It changes the core Task class, the commit
log is a single line, and as far as I can tell it was not discussed or
reviewed by anyone in the community. Don't changes like this at least
deserve a jira before we release them?

Thanks,
Eli

On Tue, May 3, 2011 at 1:39 AM, Konstantin Shvachko
 wrote:
> I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop a
> step forward.
>
> Looks like the technical difficulties are resolved now with latest Arun's
> commits.
> Being a superset of hadoop-0.20.2 it can be considered based on one of the
> official Apache releases.
> I don't think there was a lack of discussions on the lists about the issues
> included in the release candidate. Todd did a thorough review of the entire
> security branch. Many developers participated in discussions.
> Agreeing with Stack I wish HBase was considered a primary target for Hadoop
> support. But it is not realistic to have it in hadoop-0.20.203.
> I have some experience running a version of this release candidate on a
> large cluster. It works. I would add a couple of patches, which make it run
> on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.
>
> Thanks,
> --Konstantin
>
>
> On Mon, May 2, 2011 at 5:12 PM, Ian Holsman  wrote:
>
>>
>> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:
>>
>> >>
>> >> Owen, Suresh and I have committed everything on this list except
>> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/
>> >> necessary, I'll check with Cos.  Other than that hadoop-0.20.203 now a
>> >> superset of hadoop-0.20.2.
>> >>
>> >
>> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari
>> before committing.
>> >
>> > Arun
>>
>> Thanks for doing this so fast Arun.
>>
>>
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Todd Lipcon
Just to gauge what amount of stuff is in branch-0.20-security-203 I wrote a
quick script which does a comparison based on JIRAs mention in the commit
log. It output the following list of JIRAs that are in the branch but not
committed to trunk. I've marked many as N/A meaning that they don't apply to
trunk:

< HADOOP-1026
< HADOOP-6304 N/A
< HADOOP-6544
< HADOOP-6598 N/A
< HADOOP-6638
< HADOOP-6653 N/A
< HADOOP-6716 N/A
 < HADOOP-6718 N/A
< HADOOP-6728
< HADOOP-6745
< HADOOP-6757
< HADOOP-6776 N/A
< HADOOP-6808
< HADOOP-6810 N/A
< HADOOP-6832
< HADOOP-6855 N/A
< HADOOP-7143 N/A
< HADOOP-7190 N/A
< HADOOP-7232 N/A
< HADOOP-7243 N/A
< HADOOP-7246 (not sure about this)
< HADOOP-7247
< HADOOP-7253
< HDFS-1020 N/A
< HDFS-1022
< HDFS-1136 N/A
< HDFS-1153
< HDFS-1158
< HDFS-1313 N/A
< HDFS-495 N/A
< HDFS-6599 <-- must be a typo in commit log
< HDFS-740  N/A
< MAPREDUCE-1088
< MAPREDUCE-1100
< MAPREDUCE-1118
< MAPREDUCE-1207
< MAPREDUCE-1361 N/A
< MAPREDUCE-1376 N/A
< MAPREDUCE-1442 N/A
< MAPREDUCE-1521
< MAPREDUCE-1526 N/A
< MAPREDUCE-1550 N/A
< MAPREDUCE-1594 N/A
< MAPREDUCE-1641
< MAPREDUCE-1671
< MAPREDUCE-1672
< MAPREDUCE-1676
< MAPREDUCE-1677
< MAPREDUCE-1682
< MAPREDUCE-1687
< MAPREDUCE-1699 Unclear
< MAPREDUCE-1711 N/A
< MAPREDUCE-1713
< MAPREDUCE-1716
< MAPREDUCE-1730
< MAPREDUCE-1741
< MAPREDUCE-1744
< MAPREDUCE-1758
< MAPREDUCE-1759 N/A
< MAPREDUCE-1778 N/A
< MAPREDUCE-1790
< MAPREDUCE-1807 N/A
< MAPREDUCE-1854
< MAPREDUCE-1871
< MAPREDUCE-1872
< MAPREDUCE-1882
< MAPREDUCE-1889
< MAPREDUCE-1890
< MAPREDUCE-1914
< MAPREDUCE-1921
< MAPREDUCE-1933
< MAPREDUCE-1934
< MAPREDUCE-1938
< MAPREDUCE-1943
< MAPREDUCE-1954
< MAPREDUCE-1955
< MAPREDUCE-1960
< MAPREDUCE-1964
< MAPREDUCE-1966
< MAPREDUCE-1971
< MAPREDUCE-2005 N/A
< MAPREDUCE-2019
< MAPREDUCE-2055
< MAPREDUCE-2316
< MAPREDUCE-2355
< MAPREDUCE-291
< MAPREDUCE-323
< MAPREDUCE-339
< MAPREDUCE-517
< MAPREDUCE-6419 <-- doesnt exist, probably typo

Certainly some of these are new test cases, benchmark improvements, or
system tests. But many others are large new features (e.g new metrics
framework, separate JobHistory service). Others also introduce new
configurations (eg new JT based limits). In the list above there are 58 that
seem to be applicable, probably at least half of which are non-test code.

This list above doesn't include 192 patches that were committed to the
branch without reference to any JIRA in the commit message:
todd@todd-w510:~/git/hadoop-common$ for x in $(git rev-list
origin/branch-0.20..origin/branch-0.20-security-203 -- src) ; do git log -n1
$x | egrep -q '(MAPREDUCE|HDFS|HADOOP)[-:][0-9]+' || echo $x ; done | wc -l
192
Browsing through these, many have already been forward ported, or at least
had corresponding JIRAs opened. But it's very difficult to match them up and
evaluate which ones have been committed. Eli pointed out one earlier this
week that was done by a non-committer with no public review that introduced
an apparent performance regression; it's difficult to know whether there
might be others as well.

Rather than being a "maintenance release" (as is usually expected when
incrementing the third component of a version string) this is essentially a
separate trunk off of 0.20. I agree that the advancements in this branch are
many, and are a great set of contributions for the community. User limits
and security are two such that have been cited in this thread; unfortunately
the new improvements in limits haven't been committed to trunk, and the
security in trunk has a known root exploit. Do users really want to see us
putting these things in 20 without making sure they'll also show up in
future releases?

Looking at recent history of 204 it seems some more patches have gone in
there before going into trunk as well - for example MR-2429. Arun and Sid
are working on forward-porting it, and it's obviously not due to any kind of
bad intent that it was missed, but it underscores the dangers of having
essentially two trunks in ASF. I completely agree that there should be
long-term maintenance branches at the ASF, but we need to establish a clear
process to make sure that "maintenance" doesn't diverge into something else.

Here are two requests that others have made but I haven't seen an answer to
yet:

- Document the criteria by which developers can judge whether an improvement
should be included in branch-0.20-security. The inclusion criteria for the
branch as it stands is not clear -- given this branch's lineage, it clearly
used to be "things important for the Yahoo clusters", but that doesn't seem
like a reasonable community criterion. Up until now in Hadoop's history, the
criteria has always been "compatible bug fixes only", which doesn't describe
this branch either.

- Clearly establish the process that all patches must either be committed to
trunk first (and then backported), or include a comment on the JIRA
explaining why this is not necessary. Additionally we should decide whether
patches must be backported "through" 0.22 or if

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Doug Cutting
On 05/02/2011 02:33 PM, Arun C Murthy wrote:
> Are you simply asking for someone to go through the 450 odd jiras and
> set 'fix-for' fields?

Every other release we've made is well-correlated with Jira.  It should
not be difficult to achieve that for this one.  We could write a script
to take all 450 bug IDs from the change log and use Jira's command-line
tool to set the "fix-for" to be this 0.20+security release.  Would you
like help with that?

Doug


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Arun C Murthy
On May 3, 2011, at 5:17 PM, "Doug Cutting"  wrote:

> On 05/02/2011 02:33 PM, Arun C Murthy wrote:
>> Are you simply asking for someone to go through the 450 odd jiras and
>> set 'fix-for' fields?
> 
> Every other release we've made is well-correlated with Jira.  It should
> not be difficult to achieve that for this one.  We could write a script
> to take all 450 bug IDs from the change log and use Jira's command-line
> tool to set the "fix-for" to be this 0.20+security release.  Would you
> like help with that?
> 

Yes please, that would be great. Thanks!

Arun 


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-04 Thread Doug Cutting
On 05/03/2011 06:01 PM, Arun C Murthy wrote:
> On May 3, 2011, at 5:17 PM, "Doug Cutting"  wrote:
> 
>> On 05/02/2011 02:33 PM, Arun C Murthy wrote:
>>> Are you simply asking for someone to go through the 450 odd jiras and
>>> set 'fix-for' fields?
>>
>> Every other release we've made is well-correlated with Jira.  It should
>> not be difficult to achieve that for this one.  We could write a script
>> to take all 450 bug IDs from the change log and use Jira's command-line
>> tool to set the "fix-for" to be this 0.20+security release.  Would you
>> like help with that?
>>
> 
> Yes please, that would be great. Thanks!

Please find below a script that will add a fix-version to issues.

Doug

#!/bin/bash

# reads bug ids from standard input
# and adds the fixVersion named on command line

if [ $# -eq 0 ]
then
  echo "Usage: $0 bugid"
  exit 1
fi

fix=$1
echo Setting fix version to $fix.

server=https://issues.apache.org/jira
jira=./jira-cli-2.0.0/jira.sh

set -e

echo -n "Jira username: "
read user
echo -n "Jira password: "
stty -echo
read password
stty echo

while read issue
do
# first read the old fix versions
old=`$jira -a getFieldValue --server $server \
 --password $password --user $user \
 --issue $issue --field fixVersions | \
 tail -n 1 | sed 's/([0-9]*)//g' | sed s/\'//g`

# now update, adding new value
# jira will ignore if this value is already present
$jira -a updateIssue --server $server \
 --password $password --user $user \
 --issue $issue --fixVersions "${old},${fix}"
done


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-07 Thread Konstantin Shvachko
-1 for rc1

I downloaded and ran the test target 3 times.

First run failed because my umask is defaulted to 0002, which is a known
problem HADOOP-5050 committed to 0.21 but not 0.20.
Set umask to 0022 and re-ran test twice. Both resulted in failure. Here is
the list of failed tests:
[junit] Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED
(timeout)
[junit] Test
org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED
[junit] Test org.apache.hadoop.mapred.TestJobTrackerSafeMode FAILED
[junit] Test org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript FAILED
[junit] Test org.apache.hadoop.mapred.TestRecoveryManager FAILED
[junit] Test org.apache.hadoop.mapred.TestTTMemoryReporting FAILED
[junit] Test org.apache.hadoop.mapred.TestTaskTrackerLocalization FAILED
[junit] Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED

I am in favor of releasing hadoop-0.20.203.
And we run a version of this release on a large cluster at eBay. I know it
works.
I understand the controversy behind it. I regret it hasn't been developed in
a true community way.
I think it nevertheless adds value to Apache Hadoop.
Lets just make sure it passes the tests.

Thanks,
--Konstantin



On Tue, May 3, 2011 at 1:39 AM, Konstantin Shvachko wrote:

> I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop
> a step forward.
>
> Looks like the technical difficulties are resolved now with latest Arun's
> commits.
> Being a superset of hadoop-0.20.2 it can be considered based on one of the
> official Apache releases.
> I don't think there was a lack of discussions on the lists about the issues
> included in the release candidate. Todd did a thorough review of the entire
> security branch. Many developers participated in discussions.
> Agreeing with Stack I wish HBase was considered a primary target for Hadoop
> support. But it is not realistic to have it in hadoop-0.20.203.
> I have some experience running a version of this release candidate on a
> large cluster. It works. I would add a couple of patches, which make it run
> on Windows for me like HADOOP-7110, HADOOP-7126. But those are not blockers.
>
> Thanks,
> --Konstantin
>
>
> On Mon, May 2, 2011 at 5:12 PM, Ian Holsman  wrote:
>
>>
>> On May 3, 2011, at 9:58 AM, Arun C Murthy wrote:
>>
>> >>
>> >> Owen, Suresh and I have committed everything on this list except
>> >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/
>> >> necessary, I'll check with Cos.  Other than that hadoop-0.20.203 now a
>> >> superset of hadoop-0.20.2.
>> >>
>> >
>> > Missed adding HADOOP-5759 to that list, I'll check with Amareshwari
>> before committing.
>> >
>> > Arun
>>
>> Thanks for doing this so fast Arun.
>>
>>
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-11 Thread Owen O'Malley
The vote on 0.20.203.0-rc1 passes. We have our first stable release in a
year. The votes were:

13 +1's (10 binding: Arun, Chris, Devaraj, Dhruba, Jakob, Mahadev, Nicholas,
Owen, Sanjay,
 Suresh)
7 -1's (4 binding: Doug, Eli, Konstantin, Todd)

Konstantin, your issue with the test cases requiring a umask 02 is a good
point. I'll patch it and can roll a 0.20.203.1 release candidate.

-- Owen


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-11 Thread Ian Holsman
congratulations guys!
On May 12, 2011, at 3:36 AM, Owen O'Malley wrote:

> The vote on 0.20.203.0-rc1 passes. We have our first stable release in a
> year. The votes were:
> 
> 13 +1's (10 binding: Arun, Chris, Devaraj, Dhruba, Jakob, Mahadev, Nicholas,
> Owen, Sanjay,
> Suresh)
> 7 -1's (4 binding: Doug, Eli, Konstantin, Todd)
> 
> Konstantin, your issue with the test cases requiring a umask 02 is a good
> point. I'll patch it and can roll a 0.20.203.1 release candidate.
> 
> -- Owen



Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-11 Thread Konstantin Shvachko
> Konstantin, your issue with the test cases requiring a umask 02 is a good
> point. I'll patch it and can roll a 0.20.203.1 release candidate.

umask is not a big concern. I reset it to standard 0022.
Still there were 8 other test failures: 7 in mapred, and 1 hdfsproxy.
Stable release should pass unit tests.
Lets make 0.20.203.1 stable.

Thanks,
--Konstantin

On Wed, May 11, 2011 at 10:36 AM, Owen O'Malley  wrote:

> The vote on 0.20.203.0-rc1 passes. We have our first stable release in a
> year. The votes were:
>
> 13 +1's (10 binding: Arun, Chris, Devaraj, Dhruba, Jakob, Mahadev,
> Nicholas,
> Owen, Sanjay,
> Suresh)
> 7 -1's (4 binding: Doug, Eli, Konstantin, Todd)
>
> Konstantin, your issue with the test cases requiring a umask 02 is a good
> point. I'll patch it and can roll a 0.20.203.1 release candidate.
>
> -- Owen
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-11 Thread Raghu Angadi
Congrats anad thanks to all the developers for such passion and hard work
they they put into a very important project.

+1 for getting new 0.20 release out with security. It is great news for
users for that new apache release is finally coming out. I hope most of the
technical as well as non-technical issues will resolved this or next
release.

There are many important issues raised in this thread and these are crucial
for future of Apache Hadoop. wanted to echo one of them in particular :
community is the most important aspect of the project and triumphs over the
rest.
I am sure the process would be much smoother going forward as we have more
frequent releases. This is probably the first real test for the
develop-a-large-feature-on-a-branch-and-merge process. Discussion here would
certainly lead to important improvements to the process.

Looking at from a different angle, Hadoop has a very enviable problem :
there is so much development it is very hard to co-ordinate and scale. It
has already scalled up a few times before, and with the leaders it has, it
is doing it again.

Raghu.

On Wed, May 11, 2011 at 10:36 AM, Owen O'Malley  wrote:

> The vote on 0.20.203.0-rc1 passes. We have our first stable release in a
> year. The votes were:
>
> 13 +1's (10 binding: Arun, Chris, Devaraj, Dhruba, Jakob, Mahadev,
> Nicholas,
> Owen, Sanjay,
> Suresh)
> 7 -1's (4 binding: Doug, Eli, Konstantin, Todd)
>
> Konstantin, your issue with the test cases requiring a umask 02 is a good
> point. I'll patch it and can roll a 0.20.203.1 release candidate.
>
> -- Owen
>


Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-11 Thread Owen O'Malley

On May 11, 2011, at 11:40 AM, Konstantin Shvachko wrote:

> umask is not a big concern. I reset it to standard 0022.

It still should be fixed.

> Still there were 8 other test failures: 7 in mapred, and 1 hdfsproxy.
> Stable release should pass unit tests.

All unit tests pass for me. Others have also run the unit tests and had them 
pass.
If you can figure out what is going wrong, that would be great.

> Lets make 0.20.203.1 stable.

+1

-- Owen

Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Ian Holsman
moving this thread to general@

On May 3, 2011, at 3:58 AM, Doug Cutting wrote:

>> Should we release
>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
> 
> The patch selection process for this branch did not appear to be a
> community process.  A massive patch set was committed en-masse with no
> public discussion before or after about its specific composition.

guys...
1. do we agree this is an issue
2. if it is, how we do get the communication & discussion on list?

what do people think are the major issues that are stopping people talking 
about stuff on list are?

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eric Baldeschwieler

Hi folks,

This strikes me as a bit odd. I think we have already discussed this at length 
and agreed that a release could proceed. 

Since then, Arun and Owen have worked actively to incorporated community 
feedback into this release. 

All parties making Hadoop releases other then Apache have already incorporated 
most of the patches in this release into their products, including doug's 
organization. I don't see how Hadoop's users benefit from Apache not 
incorporating them into an Apache release. 

As previously discussed, all parties are welcome to champion altenative 
releases from Apache if they want to invest in making Apache Hadoop better.

Thanks!!

E14

---
E14 - typing on glass

On May 2, 2011, at 12:16 PM, "Ian Holsman"  wrote:

> moving this thread to general@
> 
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
> 
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>> 
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
> 
> guys...
> 1. do we agree this is an issue
> 2. if it is, how we do get the communication & discussion on list?
> 
> what do people think are the major issues that are stopping people talking 
> about stuff on list are?


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 01:05 PM, Eric Baldeschwieler wrote:
> As previously discussed, all parties are welcome to champion
> altenative releases from Apache if they want to invest in making
> Apache Hadoop better.

I do not believe that different organizations should release their own
versions of Hadoop posing as Apache releases.  If folks wish to release
their own versions, then they should call them something else and
release them themselves.  The Apache Hadoop project should create
releases collaboratively, through an open process.  The standard means
is to start a branch from trunk or a prior release and propose patches
to that branch, one-by-one.  This candidate diverged sufficiently from
this pattern that, for me, it doesn't qualify as a community release.

Cheers,

Doug


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread James Seigel
Hello!

I guess I am concerned as a user of hadoop that the only way to get an 
“endorsed” up-to-date version of hadoop one has to abandon the community and 
“trust” a commercial release with its special sauce.

I am just hoping that the community can put together a nice stable up-to-date 
patched version.  That’d be nice.  It probably won’t change my commercial 
deploy, but it would give me something to compare with :)

Just my $0.02 (CND)
Cheers
James.

On 2011-05-02, at 2:51 PM, Doug Cutting wrote:

> On 05/02/2011 01:05 PM, Eric Baldeschwieler wrote:
>> As previously discussed, all parties are welcome to champion
>> altenative releases from Apache if they want to invest in making
>> Apache Hadoop better.
> 
> I do not believe that different organizations should release their own
> versions of Hadoop posing as Apache releases.  If folks wish to release
> their own versions, then they should call them something else and
> release them themselves.  The Apache Hadoop project should create
> releases collaboratively, through an open process.  The standard means
> is to start a branch from trunk or a prior release and propose patches
> to that branch, one-by-one.  This candidate diverged sufficiently from
> this pattern that, for me, it doesn't qualify as a community release.
> 
> Cheers,
> 
> Doug



Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eli Collins
Hey Eric,

I don't have any objections to a release from
branch-0.20-security-203.  However when I examined the specific patch
set I noticed the are important implications with respect to
compatibility (of for 0.20.2 and 0.22), a question about project model
(eg not reviewing patches on jira before committing them, not having
patches go through trunk, etc), and some open questions for users (eg
is this the next dot release of the stable branch?).

I agree this is a valuable artifact, but that doesn't mean it's OK to
ignore compatibility concerns, etc.

I've listed specifics questions/comments here:
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201105.mbox/%3CBANLkTinZ=xb6kj5pteln5kkd9b-cwam...@mail.gmail.com%3E

Thanks,
Eli

On Mon, May 2, 2011 at 1:05 PM, Eric Baldeschwieler
 wrote:
>
> Hi folks,
>
> This strikes me as a bit odd. I think we have already discussed this at 
> length and agreed that a release could proceed.
>
> Since then, Arun and Owen have worked actively to incorporated community 
> feedback into this release.
>
> All parties making Hadoop releases other then Apache have already 
> incorporated most of the patches in this release into their products, 
> including doug's organization. I don't see how Hadoop's users benefit from 
> Apache not incorporating them into an Apache release.
>
> As previously discussed, all parties are welcome to champion altenative 
> releases from Apache if they want to invest in making Apache Hadoop better.
>
> Thanks!!
>
> E14
>
> ---
> E14 - typing on glass
>
> On May 2, 2011, at 12:16 PM, "Ian Holsman"  wrote:
>
>> moving this thread to general@
>>
>> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
>>
 Should we release
 http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>>>
>>> The patch selection process for this branch did not appear to be a
>>> community process.  A massive patch set was committed en-masse with no
>>> public discussion before or after about its specific composition.
>>
>> guys...
>> 1. do we agree this is an issue
>> 2. if it is, how we do get the communication & discussion on list?
>>
>> what do people think are the major issues that are stopping people talking 
>> about stuff on list are?
>


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Roy T. Fielding
On May 2, 2011, at 12:15 PM, Ian Holsman wrote:

> moving this thread to general@
> 
> On May 3, 2011, at 3:58 AM, Doug Cutting wrote:
> 
>>> Should we release
>>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/?
>> 
>> The patch selection process for this branch did not appear to be a
>> community process.  A massive patch set was committed en-masse with no
>> public discussion before or after about its specific composition.
> 
> guys...
> 1. do we agree this is an issue

Of course it is an issue.  Anyone can make it an issue -- no
agreement is necessary.

> 2. if it is, how we do get the communication & discussion on list?

By communicating and discussing on list.  Like, for example,
by proposing a release vote and people objecting to it, followed
by a polite collaboration on ways to reduce objections if that
is needed to get a release out the door.

> what do people think are the major issues that are stopping people talking 
> about stuff on list are?

The fact that people can vote on individual issues via jira,
which means that there is effectively no discussion of the
product as a whole on list.  I am constantly amazed at how
quiet it is in this project, at least until I remember that
most of the work is done exclusively via jira, unlike any of
my other followed projects that use jira.  I'd suggest that
the right place to hold any discussion is on the dev list,
but I am not on that list because it receives way too many
automated notifications.  Maybe it would help discussion on
dev if notices were sent elsewhere and only discussions were
held on dev.

By all means, produce a tarball and let the entire PMC vote
on it as the next release.  My personal preference is to not
allow anything that deviates from the major.minor.patch release
numbering that most software projects follow, but I don't have
a vote here.

It is perfectly reasonable for Doug (or anyone else) to vote
on a release based on a lack of version history, adequate
description of the sweet meats, or anything else that others
might consider non-technical.  This is a release vote!
It does not require consensus.  It requires minimal review
(usually meaning three +1s) and a majority opinion of those
on the PMC who choose to review the proposed release and vote.

Roy

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Milind Bhandarkar

>It is perfectly reasonable for Doug (or anyone else) to vote
>on a release based on a lack of version history, adequate
>description of the sweet meats, or anything else that others
>might consider non-technical.  This is a release vote!
>It does not require consensus.  It requires minimal review
>(usually meaning three +1s) and a majority opinion of those
>on the PMC who choose to review the proposed release and vote.

Roy,

Thanks for reminding everyone that a release does not require consensus.

Regarding this release, I think anyone who runs a multi-tenant Hadoop
cluster will appreciate the user-limits feature that goes a long way to
ensure that an errant job does not take the entire cluster down. Your
operations and support people will thank you for deploying this release.

Recently I was discussing with operations folks at a company that operates
a Hadoop cluster based on a commercial distribution of Hadoop, and they
were excited to hear that they will have a way of making sure that their
cluster will not be taken down by an errant user/job, because that's one
big fear that keeps them awake.

FWIW, I am +1 for this release.

Arun, can you include a document that gives more details about what the
limits are, and how to modify your jobs to stay below these limits (I know
it is a cut-paste for you :-)?

- milind



Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Steve Loughran

On 03/05/11 01:41, Roy T. Fielding wrote:
>
 I am constantly amazed at how

quiet it is in this project, at least until I remember that
most of the work is done exclusively via jira, unlike any of
my other followed projects that use jira.  I'd suggest that
the right place to hold any discussion is on the dev list,
but I am not on that list because it receives way too many
automated notifications.  Maybe it would help discussion on
dev if notices were sent elsewhere and only discussions were
held on dev.


I've seen this before on the Maven lists, where there's mostly a stream 
of JIRA changes above anything else:

http://mail-archives.apache.org/mod_mbox/maven-dev/200510.mbox/browser

however, they've got no JIRA issues in their list now, which may imply 
all changes aren't going to the list, or they arent using it so much:

http://mail-archives.apache.org/mod_mbox/maven-dev/201104.mbox/browser

(pause: bisecting their list shows that in 1.mar.06 they forked JIRA to 
a separate list to hide the details of ongoing work)


In some ways it's a means of dealing with a large and fast moving 
codebase: you subscribe to the issues that matter to you, all the 
discussions on a specific feature are archived, etc.


However, it has some flaws
 -discouragement of community, you become a group of people working on 
JIRA issues, rather than on a large integrated project
 -with work spread across common, hdfs and mapreduce JIRAs and mailing 
lists, it's hard to keep all the things in your head -it is pretty much 
a full time job to do so. And I don't know about the others, but I don't 
have the time.
 -we need a way of gently moving people from those who use hadoop to 
those who develop it. To me, every end user is a warm engineering 
resource we just need to point at a problem that they care about. The 
scale of the project, its complexity, JIRA change rate and testing 
difficulties are all barriers to entry -you end up needing a team of people

 * someone to track all the issues and keep the design in their head
 * 1+ person to test
 * 1+ person to code
I don't know about others, but I can't do this on my own.

The attempt to split up into HDFS+MAPREDUCE was one tactic to deal with 
this, but it hasn't worked, we just have more mailing lists to track (or 
in my case, fall behind on).


votewise:

-I'm favour of shipping an apache release of 20.x that has the patches 
that Y! and others have added to deal with scale and availability -and 
which has been tested by them. This will provide an apache release for 
people to use in production systems -because the official apache 
releases have lagged the CDH and Y! releases.


-I'd like to see all the changes integrated into trunk too, as it 
doesn't make sense for a patch in this branch not to be in trunk.


Steve


Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Roy T. Fielding
PLEASE NOTE

Voting +1 for a release means that you have downloaded the
source code package, verified its signatures, compiled it
on your platform of choice, and checked to your satisfaction
that it matches the source code we have in subversion and that
is is better (in your opinion) than the last Apache release
of the same name.

The ASF relies on that minimum amount of peer review to make
sure that we don't release trojan horses, license violations,
or other things that might get us sued as a foundation or as
individuals.  If you don't have time to do it yourself, then
vote +0 (with happy feelings) and hope that there are at least
three members of the PMC who do have that time.

DO NOT +1 a release just because it seems like progress.
Progress is in the doing, not the talking.

Roy