Re: [VOTE] Release candidate 0.20.203.0-rc1
Roy, On Wed, May 4, 2011 at 7:22 PM, Roy T. Fielding wrote: > The ASF is a vehicle for whomever wishes to collaborate on a > given project. Collaboration means helping do the work. Those > who do the work may do so for whatever reasons that they think > are good, whether it is because they feel like being charitable > today, they get paid a salary and the big boss said "work on > this part", or because they just have an itch worth scratching. > > Apache does not care why people choose to collaborate or > how they choose to apply their own intellectual efforts. We > welcome all forms of contribution under the terms of our license. I don't think I was arguing against the contribution of the code in that branch, it's very welcome, but I'm questioning (and ranting about) the motivation for releasing a version that even just by name is a weird hulla-hoop around the usual development practices that Hadoop has had in the past (not that it's set in stone). So I wanted to contribute my negative non-binding vote to highlight that this release is probably very confusing for the general user. This is 0.20, but it's not. Also it has more numbers, and it starts at 203. Why doing this at all instead of just moving on with 0.22? Or is 0.22 bound to be like 0.21? It almost begs the question if this should be called 0.22.0 then. > > What we do require is a certain amount of civility regarding > our voting procedures and an emphasis on individual responsibility > for your votes. Anyone caught *voting* a particular way just > because the boss says so will be dealt with severely. Votes > are how we do quality control and make decisions, and no other > company can be allowed to make decisions for our non-profit. Yeah I don't think that's a problem here, everyone seem to have their very own strong opinions.
RE: [VOTE] Release candidate 0.20.203.0-rc1
Agree. As a new comer, I had trouble figuring out which version to adopt -- 0.20.2 vs. 0.21. This new release candidate seems to add more confusion to general users. Jane -Original Message- From: Matei Zaharia [mailto:ma...@eecs.berkeley.edu] Sent: Wednesday, May 04, 2011 11:21 PM To: general@hadoop.apache.org Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1 I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline? To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective. Matei On May 4, 2011, at 10:47 PM, Eric Sammer wrote: > (non-binding) -1 for similar reasons to what Jeff and others have laid out, > and certainly if we're going to change the development process as a side > effect of a release vote. > > On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher wrote: > >> -1. >> >> As Roy says, "whatever gets released will define the new norm by which >> policies are assumed", and I certainly don't want this project to change >> its >> norms to accommodate bad practices. In particular, Eli presented three very >> reasonable technical objections to this release. To summarize: >> >> 1) Let's get the JIRAs that are going into this release into trunk first. >> 2) Let's create a JIRA for each issue in the release. >> 3) Let's stick to the release numbering conventions established for this >> project. >> >> I know the folks at Yahoo! are all professional engineers and done >> tremendous work to help get the project to this point. There's no doubt in >> my mind they understand the validity of the above three technical >> objections. In fact, many of them helped author our "How to Contribute" >> page, which established these conventions: >> wiki.apache.org/hadoop/HowToContribute. We develop new features against >> trunk, we create JIRAs for each issue, we review code before it goes into >> trunk, and we only update old releases with bug fixes. >> >> I couldn't be more excited to have Yahoo! once again doing development in >> Apache, and I hope that we can work together to get the work that you've >> done in this branch into one of our upcoming feature releases. >> >> I hope those who voted +1 before Roy clarified what a release vote will >> mean >> for future project norms will reconsider their votes. >> >> While there may be many competing agendas in this community, we all wish to >> see Apache Hadoop releases of the highest quality. Changing our norms to >> allow huge, unreviewed patch sets introducing new features into a past >> release is a step in the wrong direction. >> >> With a little bit of elbow grease, we can get the work done in this branch >> into trunk, get 0.22 out the door, and be ready for a great 0.23 release. >> >> Later, >> Jeff >> >> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley wrote: >> >>> I'm really not sure yet how to vote here. I was going to vote +1 for >> what >>> I was told by a number of Yahoo! committers would be a one time release >> as >>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended >>> their own distribution. Clearly this code was not all developed as a >>> community process, but I was going to support a one time release of what >>> they had developed in exclusion. >>> >>> Then I read Roy's email, which confused me. We would he or I or anyone >>> else support this release setting precedent or policy since it would walk >>> all over our bylaws, community process, and the consensus nature of our >>> foundation? This release vote is a lazy majority of the PMC, but other >>> decisions rolled up in this are supposed to be lazy majority of active >>> committers or, in the case of code changes, a lazy consensus. Setting >>> policy by this release means any sufficiently large group of committers >>> could go off and develop on their own and then commit it to a branch and >>> call a release. >>> >>> Furthermore, it now sounds like this is possibly the first in a line of >>> feature releases off this branch. Bug fixes releases, sure. But feature >>> releases? What's wrong with trunk? >>> >>> Nige >>> >>> On May 4, 201
Re: [DISCUSSION] Release rules
On Wed, May 4, 2011 at 5:59 PM, Tom White wrote: > One year ago (to the day!) Chris started a discussion about the > release manager role > (http://mail-archives.apache.org/mod_mbox/hadoop-general/201005.mbox/%3ch2q1267dd3b1005041331r7d8f696di370a279ff6058...@mail.gmail.com%3E). > In light of today's disagreements, I think we should restart this > discussion and incorporate these rules into the bylaws, since it > formalizes our practices. > > I'm happy to drive this. We could start by discussing Chris' proposal > (see clarifications in > http://mail-archives.apache.org/mod_mbox/hadoop-general/201005.mbox/%3ct2y1267dd3b1005051201h7116e4caud75673ac9d512...@mail.gmail.com%3E), > then when we get consensus we can put the document on the website. > (BTW does anyone know if the bylaws were checked into SVN anywhere? > These belong together.) Sounds good to me. I like Chris' proposal, he was clear that "nothing should be in (unreleased) 0.x that isn't also in trunk." so that may needs to be revisited if we want to be consistent with today's vote. I don't think the bylaws were checked in, we should do that first. How about checking them into the site repo so they get generated as part of the docs? Eg this is how Pig does it: http://pig.apache.org/bylaws.html Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
+1 downloaded, built, deployed on one node cluster. sanjay On May 4, 2011, at 10:31 AM, Owen O'Malley wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse- plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen
Re: [DISCUSSION] development process of Hadoop
On Wed, May 4, 2011 at 7:39 PM, Eric Yang wrote: > If we reflect back and see how the development community end up in its > current state for Hadoop. There are development rapidly happening and tested > in all kind of organizations. However, Hadoop committers are only committing > code that are interested by the sponsored companies. People are coding > defensively to ensuring only self serving patches would be committed, and > helping others and merging problem are always prioritized secondary. While > the world demand agility, the "review then commit" process is preventing > progress from happening. Committers are afraid to commit patches because > review hasn't took place. By the time patch is reviewed, it does not apply > properly. People end up having to generate multiple version of patches to > ensure the code can be applied. The large lag time between patch generation > and reviewed is taking significant toll on the community and progress. > > Yahoo have a great team of developers who improves Hadoop at faster pace with > its own fork of the source code. The reason that Yahoo was able to achieve > faster improvement with features was due to the ability to use source code > repository tools properly. Unfortunate for Yahoo, their source code > repository was not Apache svn trunk. I applause Owen and Arun's effort for > men powering and backward/forward porting the changes between yahoo github > and Apache svn. There might be some jiras that needs to be merged into > Hadoop 0.20.203 branch to ensure the linage is correct. The community should > offer to help with detail listing of what is missing rather than vote -1 > without concise reasoning of what is missing. > > JIRA is meant as a discussion and collaboration tool, but hadoop community > intends to use it as the source code version control system with men powered > diff maker. While spending time in the incubator with other project, the > mentors have explained that it is not ASF's philosophy to use "review then > commit". ASF's policy is that projects make this decision for themselves: http://www.apache.org/dev/project-creation.html The Hadoop bylaws specify that code changes are lazy consensus, ie you need a +1 from a committer. Technically the code doesn't have to be reviewed before committing it, that's just been the norm. I don't think jira is technically required either, it's just been the norm. The vote for the patch has to happen on the lists, that happens as a side effect of jira traffic going to the dev lists. > Hadoop community should rethink if the community is using the right tools for > the right task. > > Use JIRA, if there is large feature set that requires brain storming, and > developers should have the ability to make small incremental changes without > RTC. This will ensure developers help each other rather than policing each > other. > > Any thoughts? > I think you can move quickly with RTC or CTR, I've worked on RTC projects that have moved quickly. It requires people dedicate bandwidth to reviewing changes. If you do want all your code reviewed (at some point) then you're ultimately limited by review bandwidth, with either RTC or CTR. The time it takes to file a jira is normally insignificant compared to the time to create and test a change. The idea with using jira is that you propose/discuss a change before creating code. You could do that on the lists too. I agree using just a code review tool for small stuff would be faster, eg things that don't require a bug #, release note, etc. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
I'm not going to cast a vote, but I'm concerned about this for the same reasons Eli brought up -- in particular, compatibility with 0.22. I'm an author of several patches that have gone into 0.21 and trunk, only to stay on hiatus for 2 years because the project hasn't made a stable release since 0.20. (Today, many of these patches are being used through CDH, which is great, but it would be nice to see them in an Apache release too.) This push of features into 0.20.203 makes a widely used 0.22 seem even more distant. Can we at least get a confirmation that these changes will be included in 0.22, as well as a timeline? To support a vibrant developer community, Apache Hadoop should not just be a mechanism for Yahoo and Cloudera to publish patches. It should include a well-defined process for smaller third-party contributors to push changes that will make it into a stable release within a reasonable time horizon. The lack of such a process has been a major cause for the slowdown in the project in my perspective. Matei On May 4, 2011, at 10:47 PM, Eric Sammer wrote: > (non-binding) -1 for similar reasons to what Jeff and others have laid out, > and certainly if we're going to change the development process as a side > effect of a release vote. > > On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher wrote: > >> -1. >> >> As Roy says, "whatever gets released will define the new norm by which >> policies are assumed", and I certainly don't want this project to change >> its >> norms to accommodate bad practices. In particular, Eli presented three very >> reasonable technical objections to this release. To summarize: >> >> 1) Let's get the JIRAs that are going into this release into trunk first. >> 2) Let's create a JIRA for each issue in the release. >> 3) Let's stick to the release numbering conventions established for this >> project. >> >> I know the folks at Yahoo! are all professional engineers and done >> tremendous work to help get the project to this point. There's no doubt in >> my mind they understand the validity of the above three technical >> objections. In fact, many of them helped author our "How to Contribute" >> page, which established these conventions: >> wiki.apache.org/hadoop/HowToContribute. We develop new features against >> trunk, we create JIRAs for each issue, we review code before it goes into >> trunk, and we only update old releases with bug fixes. >> >> I couldn't be more excited to have Yahoo! once again doing development in >> Apache, and I hope that we can work together to get the work that you've >> done in this branch into one of our upcoming feature releases. >> >> I hope those who voted +1 before Roy clarified what a release vote will >> mean >> for future project norms will reconsider their votes. >> >> While there may be many competing agendas in this community, we all wish to >> see Apache Hadoop releases of the highest quality. Changing our norms to >> allow huge, unreviewed patch sets introducing new features into a past >> release is a step in the wrong direction. >> >> With a little bit of elbow grease, we can get the work done in this branch >> into trunk, get 0.22 out the door, and be ready for a great 0.23 release. >> >> Later, >> Jeff >> >> On Wed, May 4, 2011 at 9:17 PM, Nigel Daley wrote: >> >>> I'm really not sure yet how to vote here. I was going to vote +1 for >> what >>> I was told by a number of Yahoo! committers would be a one time release >> as >>> Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended >>> their own distribution. Clearly this code was not all developed as a >>> community process, but I was going to support a one time release of what >>> they had developed in exclusion. >>> >>> Then I read Roy's email, which confused me. We would he or I or anyone >>> else support this release setting precedent or policy since it would walk >>> all over our bylaws, community process, and the consensus nature of our >>> foundation? This release vote is a lazy majority of the PMC, but other >>> decisions rolled up in this are supposed to be lazy majority of active >>> committers or, in the case of code changes, a lazy consensus. Setting >>> policy by this release means any sufficiently large group of committers >>> could go off and develop on their own and then commit it to a branch and >>> call a release. >>> >>> Furthermore, it now sounds like this is possibly the first in a line of >>> feature releases off this branch. Bug fixes releases, sure. But feature >>> releases? What's wrong with trunk? >>> >>> Nige >>> >>> On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote: >>> On May 4, 2011, at 5:39 PM, Eli Collins wrote: > The point is that these discussion should be sorted out, ie you don't > change your development and release model on a release VOTE thread, > you change it on a DISCUSSION thread. That is no different than saying you have a right to veto a release u
Re: [VOTE] Release candidate 0.20.203.0-rc1
(non-binding) -1 for similar reasons to what Jeff and others have laid out, and certainly if we're going to change the development process as a side effect of a release vote. On Wed, May 4, 2011 at 9:54 PM, Jeff Hammerbacher wrote: > -1. > > As Roy says, "whatever gets released will define the new norm by which > policies are assumed", and I certainly don't want this project to change > its > norms to accommodate bad practices. In particular, Eli presented three very > reasonable technical objections to this release. To summarize: > > 1) Let's get the JIRAs that are going into this release into trunk first. > 2) Let's create a JIRA for each issue in the release. > 3) Let's stick to the release numbering conventions established for this > project. > > I know the folks at Yahoo! are all professional engineers and done > tremendous work to help get the project to this point. There's no doubt in > my mind they understand the validity of the above three technical > objections. In fact, many of them helped author our "How to Contribute" > page, which established these conventions: > wiki.apache.org/hadoop/HowToContribute. We develop new features against > trunk, we create JIRAs for each issue, we review code before it goes into > trunk, and we only update old releases with bug fixes. > > I couldn't be more excited to have Yahoo! once again doing development in > Apache, and I hope that we can work together to get the work that you've > done in this branch into one of our upcoming feature releases. > > I hope those who voted +1 before Roy clarified what a release vote will > mean > for future project norms will reconsider their votes. > > While there may be many competing agendas in this community, we all wish to > see Apache Hadoop releases of the highest quality. Changing our norms to > allow huge, unreviewed patch sets introducing new features into a past > release is a step in the wrong direction. > > With a little bit of elbow grease, we can get the work done in this branch > into trunk, get 0.22 out the door, and be ready for a great 0.23 release. > > Later, > Jeff > > On Wed, May 4, 2011 at 9:17 PM, Nigel Daley wrote: > > > I'm really not sure yet how to vote here. I was going to vote +1 for > what > > I was told by a number of Yahoo! committers would be a one time release > as > > Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended > > their own distribution. Clearly this code was not all developed as a > > community process, but I was going to support a one time release of what > > they had developed in exclusion. > > > > Then I read Roy's email, which confused me. We would he or I or anyone > > else support this release setting precedent or policy since it would walk > > all over our bylaws, community process, and the consensus nature of our > > foundation? This release vote is a lazy majority of the PMC, but other > > decisions rolled up in this are supposed to be lazy majority of active > > committers or, in the case of code changes, a lazy consensus. Setting > > policy by this release means any sufficiently large group of committers > > could go off and develop on their own and then commit it to a branch and > > call a release. > > > > Furthermore, it now sounds like this is possibly the first in a line of > > feature releases off this branch. Bug fixes releases, sure. But feature > > releases? What's wrong with trunk? > > > > Nige > > > > On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote: > > > > > On May 4, 2011, at 5:39 PM, Eli Collins wrote: > > > > > >> The point is that these discussion should be sorted out, ie you don't > > >> change your development and release model on a release VOTE thread, > > >> you change it on a DISCUSSION thread. > > > > > > That is no different than saying you have a right to veto a > > > release until the issue is addressed, which you don't have. > > > > > > A release vote is a majority decision. If the majority > > > decides to release, then whatever gets released will define > > > the new norm by which policies are assumed. If not released, > > > then I suggest collaborating more on the policies before > > > trying to vote again. > > > > > > Either way, we don't hold up a vote for the sake of a > > > policy discussion because voting is a more efficient > > > means of discovering if the policy really matters. > > > > > > Roy > > > > > > > > -- Eric Sammer twitter: esammer data: www.cloudera.com
Re: [VOTE] Release candidate 0.20.203.0-rc1
-1. As Roy says, "whatever gets released will define the new norm by which policies are assumed", and I certainly don't want this project to change its norms to accommodate bad practices. In particular, Eli presented three very reasonable technical objections to this release. To summarize: 1) Let's get the JIRAs that are going into this release into trunk first. 2) Let's create a JIRA for each issue in the release. 3) Let's stick to the release numbering conventions established for this project. I know the folks at Yahoo! are all professional engineers and done tremendous work to help get the project to this point. There's no doubt in my mind they understand the validity of the above three technical objections. In fact, many of them helped author our "How to Contribute" page, which established these conventions: wiki.apache.org/hadoop/HowToContribute. We develop new features against trunk, we create JIRAs for each issue, we review code before it goes into trunk, and we only update old releases with bug fixes. I couldn't be more excited to have Yahoo! once again doing development in Apache, and I hope that we can work together to get the work that you've done in this branch into one of our upcoming feature releases. I hope those who voted +1 before Roy clarified what a release vote will mean for future project norms will reconsider their votes. While there may be many competing agendas in this community, we all wish to see Apache Hadoop releases of the highest quality. Changing our norms to allow huge, unreviewed patch sets introducing new features into a past release is a step in the wrong direction. With a little bit of elbow grease, we can get the work done in this branch into trunk, get 0.22 out the door, and be ready for a great 0.23 release. Later, Jeff On Wed, May 4, 2011 at 9:17 PM, Nigel Daley wrote: > I'm really not sure yet how to vote here. I was going to vote +1 for what > I was told by a number of Yahoo! committers would be a one time release as > Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended > their own distribution. Clearly this code was not all developed as a > community process, but I was going to support a one time release of what > they had developed in exclusion. > > Then I read Roy's email, which confused me. We would he or I or anyone > else support this release setting precedent or policy since it would walk > all over our bylaws, community process, and the consensus nature of our > foundation? This release vote is a lazy majority of the PMC, but other > decisions rolled up in this are supposed to be lazy majority of active > committers or, in the case of code changes, a lazy consensus. Setting > policy by this release means any sufficiently large group of committers > could go off and develop on their own and then commit it to a branch and > call a release. > > Furthermore, it now sounds like this is possibly the first in a line of > feature releases off this branch. Bug fixes releases, sure. But feature > releases? What's wrong with trunk? > > Nige > > On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote: > > > On May 4, 2011, at 5:39 PM, Eli Collins wrote: > > > >> The point is that these discussion should be sorted out, ie you don't > >> change your development and release model on a release VOTE thread, > >> you change it on a DISCUSSION thread. > > > > That is no different than saying you have a right to veto a > > release until the issue is addressed, which you don't have. > > > > A release vote is a majority decision. If the majority > > decides to release, then whatever gets released will define > > the new norm by which policies are assumed. If not released, > > then I suggest collaborating more on the policies before > > trying to vote again. > > > > Either way, we don't hold up a vote for the sake of a > > policy discussion because voting is a more efficient > > means of discovering if the policy really matters. > > > > Roy > > > >
Re: [VOTE] Release candidate 0.20.203.0-rc1
I'm really not sure yet how to vote here. I was going to vote +1 for what I was told by a number of Yahoo! committers would be a one time release as Yahoo! "comes back to Apache" after a hiatus last fall/winter and ended their own distribution. Clearly this code was not all developed as a community process, but I was going to support a one time release of what they had developed in exclusion. Then I read Roy's email, which confused me. We would he or I or anyone else support this release setting precedent or policy since it would walk all over our bylaws, community process, and the consensus nature of our foundation? This release vote is a lazy majority of the PMC, but other decisions rolled up in this are supposed to be lazy majority of active committers or, in the case of code changes, a lazy consensus. Setting policy by this release means any sufficiently large group of committers could go off and develop on their own and then commit it to a branch and call a release. Furthermore, it now sounds like this is possibly the first in a line of feature releases off this branch. Bug fixes releases, sure. But feature releases? What's wrong with trunk? Nige On May 4, 2011, at 6:56 PM, Roy T. Fielding wrote: > On May 4, 2011, at 5:39 PM, Eli Collins wrote: > >> The point is that these discussion should be sorted out, ie you don't >> change your development and release model on a release VOTE thread, >> you change it on a DISCUSSION thread. > > That is no different than saying you have a right to veto a > release until the issue is addressed, which you don't have. > > A release vote is a majority decision. If the majority > decides to release, then whatever gets released will define > the new norm by which policies are assumed. If not released, > then I suggest collaborating more on the policies before > trying to vote again. > > Either way, we don't hold up a vote for the sake of a > policy discussion because voting is a more efficient > means of discovering if the policy really matters. > > Roy >
Re: [VOTE] Release candidate 0.20.203.0-rc1
just as a Tally we have 6+1's (andy.. is yours binding?? if so 7) and 3 -1's. so according to the votes so far we are releasing.. but according to our bylaws.. we need to wait 7 days for everyone to chime in. --I On May 5, 2011, at 12:22 PM, Roy T. Fielding wrote: > On May 4, 2011, at 6:24 PM, Jean-Daniel Cryans wrote: > >> Non-biding -1. >> >> I did download it and checked it out, but when I look at the >> documentation I see it says "Hadoop 0.20 documentation" in the tab on >> top. From what I can tell this isn't the branch 0.20 so I think it's >> an error and from a user point of view this looks more like something >> I would call 0.22 (although yes I understand this is 0.20 +security >> +whatever). >> >> Why would a single company push so hard to go against the "normal" >> release process just for "the benefit of putting our work in the hands >> of all hadoop users" is beyond me. It's not like people were begging >> on the mailing lists to be able to get their hands on such a release >> to the point where an emergency point release including tons of new >> features is needed. >> >> So to me the more logical reason would be monetary gains, that I would >> understand better from a for-profit company. But then why go through >> the hurdles of having such an ASF release when Y! isn't even selling >> anything remotely related to Hadoop services? And why now? >> >> But then there's this spinoff thing and it suddenly makes a lot more sense. >> >> E14 said earlier that "That is how apache works." >> >> I would say yes, maybe this is how it works, but I'm not sure I want >> to see it working like _that_. The ASF shouldn't be the vehicle for a >> single (future) company's wishes. > > The ASF is a vehicle for whomever wishes to collaborate on a > given project. Collaboration means helping do the work. Those > who do the work may do so for whatever reasons that they think > are good, whether it is because they feel like being charitable > today, they get paid a salary and the big boss said "work on > this part", or because they just have an itch worth scratching. > > Apache does not care why people choose to collaborate or > how they choose to apply their own intellectual efforts. We > welcome all forms of contribution under the terms of our license. > > What we do require is a certain amount of civility regarding > our voting procedures and an emphasis on individual responsibility > for your votes. Anyone caught *voting* a particular way just > because the boss says so will be dealt with severely. Votes > are how we do quality control and make decisions, and no other > company can be allowed to make decisions for our non-profit. > > Roy
[DISCUSSION] development process of Hadoop
If we reflect back and see how the development community end up in its current state for Hadoop. There are development rapidly happening and tested in all kind of organizations. However, Hadoop committers are only committing code that are interested by the sponsored companies. People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary. While the world demand agility, the "review then commit" process is preventing progress from happening. Committers are afraid to commit patches because review hasn't took place. By the time patch is reviewed, it does not apply properly. People end up having to generate multiple version of patches to ensure the code can be applied. The large lag time between patch generation and reviewed is taking significant toll on the community and progress. Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code. The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly. Unfortunate for Yahoo, their source code repository was not Apache svn trunk. I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn. There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct. The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing. JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker. While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit". Hadoop community should rethink if the community is using the right tools for the right task. Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC. This will ensure developers help each other rather than policing each other. Any thoughts? Regards, Eric
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 6:24 PM, Jean-Daniel Cryans wrote: > Non-biding -1. > > I did download it and checked it out, but when I look at the > documentation I see it says "Hadoop 0.20 documentation" in the tab on > top. From what I can tell this isn't the branch 0.20 so I think it's > an error and from a user point of view this looks more like something > I would call 0.22 (although yes I understand this is 0.20 +security > +whatever). > > Why would a single company push so hard to go against the "normal" > release process just for "the benefit of putting our work in the hands > of all hadoop users" is beyond me. It's not like people were begging > on the mailing lists to be able to get their hands on such a release > to the point where an emergency point release including tons of new > features is needed. > > So to me the more logical reason would be monetary gains, that I would > understand better from a for-profit company. But then why go through > the hurdles of having such an ASF release when Y! isn't even selling > anything remotely related to Hadoop services? And why now? > > But then there's this spinoff thing and it suddenly makes a lot more sense. > > E14 said earlier that "That is how apache works." > > I would say yes, maybe this is how it works, but I'm not sure I want > to see it working like _that_. The ASF shouldn't be the vehicle for a > single (future) company's wishes. The ASF is a vehicle for whomever wishes to collaborate on a given project. Collaboration means helping do the work. Those who do the work may do so for whatever reasons that they think are good, whether it is because they feel like being charitable today, they get paid a salary and the big boss said "work on this part", or because they just have an itch worth scratching. Apache does not care why people choose to collaborate or how they choose to apply their own intellectual efforts. We welcome all forms of contribution under the terms of our license. What we do require is a certain amount of civility regarding our voting procedures and an emphasis on individual responsibility for your votes. Anyone caught *voting* a particular way just because the boss says so will be dealt with severely. Votes are how we do quality control and make decisions, and no other company can be allowed to make decisions for our non-profit. Roy
Re: [VOTE] Release candidate 0.20.203.0-rc1
+1. I downloaded the bits, compiled and ran unit tests. Also, looked at the source code to some extent. Looks good. -dhruba On Wed, May 4, 2011 at 6:56 PM, Roy T. Fielding wrote: > On May 4, 2011, at 5:39 PM, Eli Collins wrote: > > > The point is that these discussion should be sorted out, ie you don't > > change your development and release model on a release VOTE thread, > > you change it on a DISCUSSION thread. > > That is no different than saying you have a right to veto a > release until the issue is addressed, which you don't have. > > A release vote is a majority decision. If the majority > decides to release, then whatever gets released will define > the new norm by which policies are assumed. If not released, > then I suggest collaborating more on the policies before > trying to vote again. > > Either way, we don't hold up a vote for the sake of a > policy discussion because voting is a more efficient > means of discovering if the policy really matters. > > Roy > > -- Connect to me at http://www.facebook.com/dhruba
Re: [VOTE] Release candidate 0.20.203.0-rc1
Speculation either on the motives of those objecting to a release or of those making contributions or proposing a release does not advance progress. The accusations and counter-accusations seen on this thread are regrettable and I feel less and less confident in the future of Apache Hadoop as time goes on. As a strong believer in and advocate of open source as an answer to technical and architectural challenges, I am pained to see the members of what should be a vibrant community litigating in an ultimately self-defeating way. If only this energy put into argument could be channeled into code or patches... In open source, if opinions were code we would rule the world. So what of this candidate? Artifact looks good, DFS tests are good, MR tests are good. Looked over some of the documentation and found no errors. To my knowledge this is now a superset of branch-0.20, addressing the reasonably determined deficit of rc0. There seems no reason other issues cannot be addressed subsequently. There has not been a release of Apache Hadoop 0.20 since at least Feb 6 2010 yet since this time important security enhancements have been contributed, but in the form of an Apache product these are only available as patches on a non-release branch. Forward progress of the Apache product seems more important than achieving the perfect release in all eyes. For example, append features remain on a non-release branch. I would really have liked to see the append changes included in this candidate, but this is not grounds for objection merely regret, and I hope this can be covered by a subsequent release, perhaps soon. After security and append features are in 0.20, in my personal humble opinion the 0.20 release in total is sufficient and all attention should be paid to the next release (0.22 or whatever), except for critical bug fixes. +1 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 5:39 PM, Eli Collins wrote: > The point is that these discussion should be sorted out, ie you don't > change your development and release model on a release VOTE thread, > you change it on a DISCUSSION thread. That is no different than saying you have a right to veto a release until the issue is addressed, which you don't have. A release vote is a majority decision. If the majority decides to release, then whatever gets released will define the new norm by which policies are assumed. If not released, then I suggest collaborating more on the policies before trying to vote again. Either way, we don't hold up a vote for the sake of a policy discussion because voting is a more efficient means of discovering if the policy really matters. Roy
Re: [VOTE] Release candidate 0.20.203.0-rc1
My (non-binding) vote for 0.20.203.0-rc1 is +1. I downloaded, compiled, ran tests, ran my bigrams example, all ran perfectly. (I did a single node test without security on.) The voting criteria I used are: 1. Is this a working release? : Yes 2. Does it take the codebase forward? : Yes 3. Does it have features that the user community might find valuable? : Yes - milind -- Milind Bhandarkar mbhandar...@linkedin.com +1-650-776-3167 On 5/4/11 6:10 PM, "Devaraj Das" wrote: >+1 based on some single node tests I did (with security ON). > > >On 5/4/11 10:31 AM, "Owen O'Malley" wrote: > >Here's an updated release candidate for 0.20.203.0. I've incorporated the >feedback and included all of the patches from 0.20.2, which is the last >stable release. I also fixed the eclipse-plugin problem. > >The candidate is at: >http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > >Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > >-- Owen >
Re: [VOTE] Release candidate 0.20.203.0-rc1
Non-biding -1. I did download it and checked it out, but when I look at the documentation I see it says "Hadoop 0.20 documentation" in the tab on top. From what I can tell this isn't the branch 0.20 so I think it's an error and from a user point of view this looks more like something I would call 0.22 (although yes I understand this is 0.20 +security +whatever). Why would a single company push so hard to go against the "normal" release process just for "the benefit of putting our work in the hands of all hadoop users" is beyond me. It's not like people were begging on the mailing lists to be able to get their hands on such a release to the point where an emergency point release including tons of new features is needed. So to me the more logical reason would be monetary gains, that I would understand better from a for-profit company. But then why go through the hurdles of having such an ASF release when Y! isn't even selling anything remotely related to Hadoop services? And why now? But then there's this spinoff thing and it suddenly makes a lot more sense. E14 said earlier that "That is how apache works." I would say yes, maybe this is how it works, but I'm not sure I want to see it working like _that_. The ASF shouldn't be the vehicle for a single (future) company's wishes. J-D On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > -- Owen
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 6:18 PM, Eric Baldeschwieler wrote: > Ok. I'll bite. > > The point of a vote is to learn what everyone thinks. So far we have learned: > > 1 - the team that is trying to contribute code and do a release thinks it is > ready. > > 2 - Cloudera does not think the release is a good idea. > I don't think that's true. There's a difference between not supporting a given rc and not supporting a release from this branch in general. With both of my hats on, I want code to be reviewed before being release, I want releases to not regress against previous releases, I don't want the next major release to regress against a stable release, I want the community to discuss new version schemes and development models vs adopting them by accident just because we voted on a particular release. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
> Entertaining concerns like a one-to-one > correspondence between commits and JIRA issues is bizarre in this > context. It's not about whether there's a jira, it's about whether the code was reviewed. We think code should be reviewed and vote on by the community before releasing it. That's how we've always rolled. Everyone agrees releases are too infrequent, that's not an excuse for steam rolling the community. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
Ok. I'll bite. The point of a vote is to learn what everyone thinks. So far we have learned: 1 - the team that is trying to contribute code and do a release thinks it is ready. 2 - Cloudera does not think the release is a good idea. No more talk between the Team contributing code and cloudera will educate us further Let's hear from the rest of the community. In parallel on other threads, let's work out how to address concerns. That will be useful however the vote goes. I promise to continue to work with everyone to help drive releases. We've called a vote, so let it proceed. That is how apache works. Thanks! --- E14 - typing on glass PS this is my last comment on this thread. Start new ones if you are not casting a vote. On May 4, 2011, at 5:45 PM, "Konstantin Boudnik" wrote: > I tend to agree. Changing release model of Apache Hadoop train isn't > something that should be done in a hassle or as a part of release > voting. > > If these questions aren't addressed - let's postpone the vote and > discuss all the complications or implications until they sorted out or > the consensus/compromise is reached. > > Cos > > On Wed, May 4, 2011 at 17:39, Eli Collins wrote: >> The point is that these discussion should be sorted out, ie you don't >> change your development and release model on a release VOTE thread, >> you change it on a DISCUSSION thread. >> >> Ie before we release this we should understand what that means. What >> is being proposed is not just another release from branch-0.20 or >> branch-0.22. >> >> Thanks, >> Eli >> >> On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar wrote: >>> Eli, >>> I think the intent from the email was to just vote on this thread, >>> which I agree with. >>> Discussions should be done in a separate threads. Hopefully we can >>> all stick to just voting! >>> >>> thanks >>> mahadev >>> >>> On Wed, May 4, 2011 at 5:22 PM, Eli Collins wrote: Good suggestion, it would be helpful to hash out the issues around compatibility, feature branches, version numbers, how to contribute at Apache before putting up new votes that would be helpful, ie the vote would go much smoother if all the issues with the previous vote were addressed before starting a new one. Thanks, Eli On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler wrote: > Hi folks, > > Let's stay focused. Let's take the other threads onto other threads. This > is a vote. > > To the extent naming is a problem, let's take that to a thread and find > an acceptable proposal. > > To the extent folks want to collaborate on certifying the release for > total lack of regression or collaborate on the cleanest possible merge, I > think all interested parties should take these topics to another thread > and divide up the work. > > If you've voted, you don't need to comment further on this thread, no > matter what company you work for! > > Thanks, > > --- > E14 - typing on glass > > On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: > >> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: >> >>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: >>> >>> The list seems highly inaccurate. Checked the first few N/A items. All are false positives. >>> Also, can you please provide a list on features which are not related >>> to >>> gridmix benchmarks or herriot tests? >>> >> >> Here are a few I quickly pulled up: >> MAPREDUCE-2316 (docs for improved capacity scheduler) >> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) >> >> " BZ-4182948. Add statistics logging to Fred for better visibility into >> startup time costs. (Matt Foley)" >> - I believe I saw a note from Matt on the JIRA yesterday about this >> feature, >> where he decided that the version done in 203 wasn't a good approach, and >> it's done differently in trunk (not sure if done yet). >> >> MAPREDUCE-2364 (important bug fix for localization) >> - in fact most of localization is different in this branch compared to >> trunk >> due to inclusion of MAPREDUCE-2378, the trunk version of which is still >> on >> the "yahoo-merge" branch,. >> >> "New cunters for FileInput/OutputFormat. New Counter >>MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, >> 4217546" >> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but >> not >> committed. >> >> - MAPREDUCE-1904, committed without JIRA as: >> ". Reducing new Path(), RawFileStatus() creation overhead in >> LocalDirAllocator" >> not in trunk >> >> +BZ4101537 . When a queue is built without any access rights we >> explain >> the >> +problem. (dking, rv
Re: [VOTE] Release candidate 0.20.203.0-rc1
I'm +1 on releasing rc1. The signature and hashes match on the artifact, ran some of the more aggressive MR tests. Reviewed changes from rc0. It looks like we need a FAQ for this release, if only to prevent the same questions from being asked and answered across different threads and lists. Reservations, regressions, and pending work can also be documented there. Right now, Apache Hadoop releases are not recommended by its community. Instead, not only our end users, but other Apache projects run Cloudera's distribution. From all those wearing their Apache hat, I would like to see more effort directed toward a release that we can recommend soon and less time spent compiling tasks to delay it. Releasing this will complicate the documented process. However, that process *has not produced a usable release* for the last two out of six years. This is failure. Entertaining concerns like a one-to-one correspondence between commits and JIRA issues is bizarre in this context. Let's find a way to make progress instead of tossing pharisaic accusations of illegitimacy. -C On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > -- Owen
Re: [VOTE] Release candidate 0.20.203.0-rc1
+1 based on some single node tests I did (with security ON). On 5/4/11 10:31 AM, "Owen O'Malley" wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen
[DISCUSSION] Release rules
One year ago (to the day!) Chris started a discussion about the release manager role (http://mail-archives.apache.org/mod_mbox/hadoop-general/201005.mbox/%3ch2q1267dd3b1005041331r7d8f696di370a279ff6058...@mail.gmail.com%3E). In light of today's disagreements, I think we should restart this discussion and incorporate these rules into the bylaws, since it formalizes our practices. I'm happy to drive this. We could start by discussing Chris' proposal (see clarifications in http://mail-archives.apache.org/mod_mbox/hadoop-general/201005.mbox/%3ct2y1267dd3b1005051201h7116e4caud75673ac9d512...@mail.gmail.com%3E), then when we get consensus we can put the document on the website. (BTW does anyone know if the bylaws were checked into SVN anywhere? These belong together.) Cheers, Tom
Re: [VOTE] Release candidate 0.20.203.0-rc1
I tend to agree. Changing release model of Apache Hadoop train isn't something that should be done in a hassle or as a part of release voting. If these questions aren't addressed - let's postpone the vote and discuss all the complications or implications until they sorted out or the consensus/compromise is reached. Cos On Wed, May 4, 2011 at 17:39, Eli Collins wrote: > The point is that these discussion should be sorted out, ie you don't > change your development and release model on a release VOTE thread, > you change it on a DISCUSSION thread. > > Ie before we release this we should understand what that means. What > is being proposed is not just another release from branch-0.20 or > branch-0.22. > > Thanks, > Eli > > On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar wrote: >> Eli, >> I think the intent from the email was to just vote on this thread, >> which I agree with. >> Discussions should be done in a separate threads. Hopefully we can >> all stick to just voting! >> >> thanks >> mahadev >> >> On Wed, May 4, 2011 at 5:22 PM, Eli Collins wrote: >>> Good suggestion, it would be helpful to hash out the issues around >>> compatibility, feature branches, version numbers, how to contribute at >>> Apache before putting up new votes that would be helpful, ie the vote >>> would go much smoother if all the issues with the previous vote were >>> addressed before starting a new one. >>> >>> Thanks, >>> Eli >>> >>> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler >>> wrote: Hi folks, Let's stay focused. Let's take the other threads onto other threads. This is a vote. To the extent naming is a problem, let's take that to a thread and find an acceptable proposal. To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work. If you've voted, you don't need to comment further on this thread, no matter what company you work for! Thanks, --- E14 - typing on glass On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: > On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: > >> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: >> >> The list seems highly inaccurate. Checked the first few N/A items. All >>> are >>> false positives. >>> >>> >> Also, can you please provide a list on features which are not related to >> gridmix benchmarks or herriot tests? >> > > Here are a few I quickly pulled up: > MAPREDUCE-2316 (docs for improved capacity scheduler) > MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) > > " BZ-4182948. Add statistics logging to Fred for better visibility into > startup time costs. (Matt Foley)" > - I believe I saw a note from Matt on the JIRA yesterday about this > feature, > where he decided that the version done in 203 wasn't a good approach, and > it's done differently in trunk (not sure if done yet). > > MAPREDUCE-2364 (important bug fix for localization) > - in fact most of localization is different in this branch compared to > trunk > due to inclusion of MAPREDUCE-2378, the trunk version of which is still on > the "yahoo-merge" branch,. > > "New cunters for FileInput/OutputFormat. New Counter > MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, > 4217546" > - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not > committed. > > - MAPREDUCE-1904, committed without JIRA as: > " . Reducing new Path(), RawFileStatus() creation overhead in > LocalDirAllocator" > not in trunk > > + BZ4101537 . When a queue is built without any access rights we > explain > the > + problem. (dking, rvw ramach) [attachment of 2010-11-24] > seems to be on trunk as MR-2411, but not committed, best I can tell, > despite > the JIRA there being resolved (based on looking at QueueManager in trunk) > > " . Remove unnecessary reference to user configuration from > TaskDistributedCacheManager causing memory leaks" > Not in trunk, not sure which JIRA it might be.. probably part of 2178. > > Major new feature: MAPREDUCE-323 - very large rework of how job history > files are managed > Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though > probably will be attacked by different JIRAs > Major new ops-visible feature: "metrics2" system > Major new ops-visible feature: MAPREDUCE-291 job history can be viewed > from > a separate server > Major new set of user-visible configurations: MAPREDUCE-1943 and friends > which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) >
Re: [VOTE] Release candidate 0.20.203.0-rc1
The point is that these discussion should be sorted out, ie you don't change your development and release model on a release VOTE thread, you change it on a DISCUSSION thread. Ie before we release this we should understand what that means. What is being proposed is not just another release from branch-0.20 or branch-0.22. Thanks, Eli On Wed, May 4, 2011 at 5:30 PM, Mahadev Konar wrote: > Eli, > I think the intent from the email was to just vote on this thread, > which I agree with. > Discussions should be done in a separate threads. Hopefully we can > all stick to just voting! > > thanks > mahadev > > On Wed, May 4, 2011 at 5:22 PM, Eli Collins wrote: >> Good suggestion, it would be helpful to hash out the issues around >> compatibility, feature branches, version numbers, how to contribute at >> Apache before putting up new votes that would be helpful, ie the vote >> would go much smoother if all the issues with the previous vote were >> addressed before starting a new one. >> >> Thanks, >> Eli >> >> On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler >> wrote: >>> Hi folks, >>> >>> Let's stay focused. Let's take the other threads onto other threads. This >>> is a vote. >>> >>> To the extent naming is a problem, let's take that to a thread and find an >>> acceptable proposal. >>> >>> To the extent folks want to collaborate on certifying the release for total >>> lack of regression or collaborate on the cleanest possible merge, I think >>> all interested parties should take these topics to another thread and >>> divide up the work. >>> >>> If you've voted, you don't need to comment further on this thread, no >>> matter what company you work for! >>> >>> Thanks, >>> >>> --- >>> E14 - typing on glass >>> >>> On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: >>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: > On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: > > The list seems highly inaccurate. Checked the first few N/A items. All >> are >> false positives. >> >> > Also, can you please provide a list on features which are not related to > gridmix benchmarks or herriot tests? > Here are a few I quickly pulled up: MAPREDUCE-2316 (docs for improved capacity scheduler) MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) " BZ-4182948. Add statistics logging to Fred for better visibility into startup time costs. (Matt Foley)" - I believe I saw a note from Matt on the JIRA yesterday about this feature, where he decided that the version done in 203 wasn't a good approach, and it's done differently in trunk (not sure if done yet). MAPREDUCE-2364 (important bug fix for localization) - in fact most of localization is different in this branch compared to trunk due to inclusion of MAPREDUCE-2378, the trunk version of which is still on the "yahoo-merge" branch,. "New cunters for FileInput/OutputFormat. New Counter MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, 4217546" - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not committed. - MAPREDUCE-1904, committed without JIRA as: " . Reducing new Path(), RawFileStatus() creation overhead in LocalDirAllocator" not in trunk + BZ4101537 . When a queue is built without any access rights we explain the + problem. (dking, rvw ramach) [attachment of 2010-11-24] seems to be on trunk as MR-2411, but not committed, best I can tell, despite the JIRA there being resolved (based on looking at QueueManager in trunk) " . Remove unnecessary reference to user configuration from TaskDistributedCacheManager causing memory leaks" Not in trunk, not sure which JIRA it might be.. probably part of 2178. Major new feature: MAPREDUCE-323 - very large rework of how job history files are managed Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though probably will be attacked by different JIRAs Major new ops-visible feature: "metrics2" system Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from a separate server Major new set of user-visible configurations: MAPREDUCE-1943 and friends which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) I have code to work on, so I won't keep going, but this is from looking at the last couple months of 203. -Todd -- Todd Lipcon Software Engineer, Cloudera >>> >> > > > > -- > thanks > mahadev > @mahadevkonar >
Re: [VOTE] Release candidate 0.20.203.0-rc1
Eli, I think the intent from the email was to just vote on this thread, which I agree with. Discussions should be done in a separate threads. Hopefully we can all stick to just voting! thanks mahadev On Wed, May 4, 2011 at 5:22 PM, Eli Collins wrote: > Good suggestion, it would be helpful to hash out the issues around > compatibility, feature branches, version numbers, how to contribute at > Apache before putting up new votes that would be helpful, ie the vote > would go much smoother if all the issues with the previous vote were > addressed before starting a new one. > > Thanks, > Eli > > On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler > wrote: >> Hi folks, >> >> Let's stay focused. Let's take the other threads onto other threads. This is >> a vote. >> >> To the extent naming is a problem, let's take that to a thread and find an >> acceptable proposal. >> >> To the extent folks want to collaborate on certifying the release for total >> lack of regression or collaborate on the cleanest possible merge, I think >> all interested parties should take these topics to another thread and divide >> up the work. >> >> If you've voted, you don't need to comment further on this thread, no matter >> what company you work for! >> >> Thanks, >> >> --- >> E14 - typing on glass >> >> On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: >> >>> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: >>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: The list seems highly inaccurate. Checked the first few N/A items. All > are > false positives. > > Also, can you please provide a list on features which are not related to gridmix benchmarks or herriot tests? >>> >>> Here are a few I quickly pulled up: >>> MAPREDUCE-2316 (docs for improved capacity scheduler) >>> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) >>> >>> " BZ-4182948. Add statistics logging to Fred for better visibility into >>> startup time costs. (Matt Foley)" >>> - I believe I saw a note from Matt on the JIRA yesterday about this feature, >>> where he decided that the version done in 203 wasn't a good approach, and >>> it's done differently in trunk (not sure if done yet). >>> >>> MAPREDUCE-2364 (important bug fix for localization) >>> - in fact most of localization is different in this branch compared to trunk >>> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on >>> the "yahoo-merge" branch,. >>> >>> "New cunters for FileInput/OutputFormat. New Counter >>> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, >>> 4217546" >>> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not >>> committed. >>> >>> - MAPREDUCE-1904, committed without JIRA as: >>> " . Reducing new Path(), RawFileStatus() creation overhead in >>> LocalDirAllocator" >>> not in trunk >>> >>> + BZ4101537 . When a queue is built without any access rights we explain >>> the >>> + problem. (dking, rvw ramach) [attachment of 2010-11-24] >>> seems to be on trunk as MR-2411, but not committed, best I can tell, despite >>> the JIRA there being resolved (based on looking at QueueManager in trunk) >>> >>> " . Remove unnecessary reference to user configuration from >>> TaskDistributedCacheManager causing memory leaks" >>> Not in trunk, not sure which JIRA it might be.. probably part of 2178. >>> >>> Major new feature: MAPREDUCE-323 - very large rework of how job history >>> files are managed >>> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though >>> probably will be attacked by different JIRAs >>> Major new ops-visible feature: "metrics2" system >>> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from >>> a separate server >>> Major new set of user-visible configurations: MAPREDUCE-1943 and friends >>> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) >>> >>> I have code to work on, so I won't keep going, but this is from looking at >>> the last couple months of 203. >>> >>> -Todd >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >> > -- thanks mahadev @mahadevkonar
Re: [VOTE] Release candidate 0.20.203.0-rc1
Good suggestion, it would be helpful to hash out the issues around compatibility, feature branches, version numbers, how to contribute at Apache before putting up new votes that would be helpful, ie the vote would go much smoother if all the issues with the previous vote were addressed before starting a new one. Thanks, Eli On Wed, May 4, 2011 at 5:05 PM, Eric Baldeschwieler wrote: > Hi folks, > > Let's stay focused. Let's take the other threads onto other threads. This is > a vote. > > To the extent naming is a problem, let's take that to a thread and find an > acceptable proposal. > > To the extent folks want to collaborate on certifying the release for total > lack of regression or collaborate on the cleanest possible merge, I think all > interested parties should take these topics to another thread and divide up > the work. > > If you've voted, you don't need to comment further on this thread, no matter > what company you work for! > > Thanks, > > --- > E14 - typing on glass > > On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: > >> On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: >> >>> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: >>> >>> The list seems highly inaccurate. Checked the first few N/A items. All are false positives. >>> Also, can you please provide a list on features which are not related to >>> gridmix benchmarks or herriot tests? >>> >> >> Here are a few I quickly pulled up: >> MAPREDUCE-2316 (docs for improved capacity scheduler) >> MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) >> >> " BZ-4182948. Add statistics logging to Fred for better visibility into >> startup time costs. (Matt Foley)" >> - I believe I saw a note from Matt on the JIRA yesterday about this feature, >> where he decided that the version done in 203 wasn't a good approach, and >> it's done differently in trunk (not sure if done yet). >> >> MAPREDUCE-2364 (important bug fix for localization) >> - in fact most of localization is different in this branch compared to trunk >> due to inclusion of MAPREDUCE-2378, the trunk version of which is still on >> the "yahoo-merge" branch,. >> >> "New cunters for FileInput/OutputFormat. New Counter >> MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, >> 4217546" >> - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not >> committed. >> >> - MAPREDUCE-1904, committed without JIRA as: >> " . Reducing new Path(), RawFileStatus() creation overhead in >> LocalDirAllocator" >> not in trunk >> >> + BZ4101537 . When a queue is built without any access rights we explain >> the >> + problem. (dking, rvw ramach) [attachment of 2010-11-24] >> seems to be on trunk as MR-2411, but not committed, best I can tell, despite >> the JIRA there being resolved (based on looking at QueueManager in trunk) >> >> " . Remove unnecessary reference to user configuration from >> TaskDistributedCacheManager causing memory leaks" >> Not in trunk, not sure which JIRA it might be.. probably part of 2178. >> >> Major new feature: MAPREDUCE-323 - very large rework of how job history >> files are managed >> Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though >> probably will be attacked by different JIRAs >> Major new ops-visible feature: "metrics2" system >> Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from >> a separate server >> Major new set of user-visible configurations: MAPREDUCE-1943 and friends >> which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) >> >> I have code to work on, so I won't keep going, but this is from looking at >> the last couple months of 203. >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >
Re: [VOTE] Release candidate 0.20.203.0-rc1
Hi folks, Let's stay focused. Let's take the other threads onto other threads. This is a vote. To the extent naming is a problem, let's take that to a thread and find an acceptable proposal. To the extent folks want to collaborate on certifying the release for total lack of regression or collaborate on the cleanest possible merge, I think all interested parties should take these topics to another thread and divide up the work. If you've voted, you don't need to comment further on this thread, no matter what company you work for! Thanks, --- E14 - typing on glass On May 4, 2011, at 4:46 PM, "Todd Lipcon" wrote: > On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: > >> On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: >> >> The list seems highly inaccurate. Checked the first few N/A items. All >>> are >>> false positives. >>> >>> >> Also, can you please provide a list on features which are not related to >> gridmix benchmarks or herriot tests? >> > > Here are a few I quickly pulled up: > MAPREDUCE-2316 (docs for improved capacity scheduler) > MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) > > " BZ-4182948. Add statistics logging to Fred for better visibility into > startup time costs. (Matt Foley)" > - I believe I saw a note from Matt on the JIRA yesterday about this feature, > where he decided that the version done in 203 wasn't a good approach, and > it's done differently in trunk (not sure if done yet). > > MAPREDUCE-2364 (important bug fix for localization) > - in fact most of localization is different in this branch compared to trunk > due to inclusion of MAPREDUCE-2378, the trunk version of which is still on > the "yahoo-merge" branch,. > > "New cunters for FileInput/OutputFormat. New Counter >MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, > 4217546" > - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not > committed. > > - MAPREDUCE-1904, committed without JIRA as: > ". Reducing new Path(), RawFileStatus() creation overhead in > LocalDirAllocator" > not in trunk > > +BZ4101537 . When a queue is built without any access rights we explain > the > +problem. (dking, rvw ramach) [attachment of 2010-11-24] > seems to be on trunk as MR-2411, but not committed, best I can tell, despite > the JIRA there being resolved (based on looking at QueueManager in trunk) > > ". Remove unnecessary reference to user configuration from > TaskDistributedCacheManager causing memory leaks" > Not in trunk, not sure which JIRA it might be.. probably part of 2178. > > Major new feature: MAPREDUCE-323 - very large rework of how job history > files are managed > Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though > probably will be attacked by different JIRAs > Major new ops-visible feature: "metrics2" system > Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from > a separate server > Major new set of user-visible configurations: MAPREDUCE-1943 and friends > which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) > > I have code to work on, so I won't keep going, but this is from looking at > the last couple months of 203. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 4:44 PM, Todd Lipcon wrote: On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: The list seems highly inaccurate. Checked the first few N/A items. All are false positives. Also, can you please provide a list on features which are not related to gridmix benchmarks or herriot tests? Here are a few I quickly pulled up: So, it's around 10? Approximately? Also, the ones you put up were reviewed via jira. Please note that several of the ones you are pointing out are already in y-merge branch which is nearly trunk. including MR-2378 as you pointed out. Thanks for the list, I'll ensure we work on forward porting them. Arun
Re: [VOTE] Release candidate 0.20.203.0-rc0
On 05/03/2011 06:01 PM, Arun C Murthy wrote: > On May 3, 2011, at 5:17 PM, "Doug Cutting" wrote: > >> On 05/02/2011 02:33 PM, Arun C Murthy wrote: >>> Are you simply asking for someone to go through the 450 odd jiras and >>> set 'fix-for' fields? >> >> Every other release we've made is well-correlated with Jira. It should >> not be difficult to achieve that for this one. We could write a script >> to take all 450 bug IDs from the change log and use Jira's command-line >> tool to set the "fix-for" to be this 0.20+security release. Would you >> like help with that? >> > > Yes please, that would be great. Thanks! Please find below a script that will add a fix-version to issues. Doug #!/bin/bash # reads bug ids from standard input # and adds the fixVersion named on command line if [ $# -eq 0 ] then echo "Usage: $0 bugid" exit 1 fi fix=$1 echo Setting fix version to $fix. server=https://issues.apache.org/jira jira=./jira-cli-2.0.0/jira.sh set -e echo -n "Jira username: " read user echo -n "Jira password: " stty -echo read password stty echo while read issue do # first read the old fix versions old=`$jira -a getFieldValue --server $server \ --password $password --user $user \ --issue $issue --field fixVersions | \ tail -n 1 | sed 's/([0-9]*)//g' | sed s/\'//g` # now update, adding new value # jira will ignore if this value is already present $jira -a updateIssue --server $server \ --password $password --user $user \ --issue $issue --fixVersions "${old},${fix}" done
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 4:11 PM, Arun C Murthy wrote: > On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: > > The list seems highly inaccurate. Checked the first few N/A items. All >> are >> false positives. >> >> > Also, can you please provide a list on features which are not related to > gridmix benchmarks or herriot tests? > Here are a few I quickly pulled up: MAPREDUCE-2316 (docs for improved capacity scheduler) MAPREDUCE-2355 (adds new config for heartbeat dampening in MR) " BZ-4182948. Add statistics logging to Fred for better visibility into startup time costs. (Matt Foley)" - I believe I saw a note from Matt on the JIRA yesterday about this feature, where he decided that the version done in 203 wasn't a good approach, and it's done differently in trunk (not sure if done yet). MAPREDUCE-2364 (important bug fix for localization) - in fact most of localization is different in this branch compared to trunk due to inclusion of MAPREDUCE-2378, the trunk version of which is still on the "yahoo-merge" branch,. "New cunters for FileInput/OutputFormat. New Counter MAP_OUTPUT_MATERIALZIED_BYTES. Related bugs: 4241034, 3418543, 4217546" - not sure which JIRA this is, I think I've seen a JIRA for trunk, but not committed. - MAPREDUCE-1904, committed without JIRA as: ". Reducing new Path(), RawFileStatus() creation overhead in LocalDirAllocator" not in trunk +BZ4101537 . When a queue is built without any access rights we explain the +problem. (dking, rvw ramach) [attachment of 2010-11-24] seems to be on trunk as MR-2411, but not committed, best I can tell, despite the JIRA there being resolved (based on looking at QueueManager in trunk) ". Remove unnecessary reference to user configuration from TaskDistributedCacheManager causing memory leaks" Not in trunk, not sure which JIRA it might be.. probably part of 2178. Major new feature: MAPREDUCE-323 - very large rework of how job history files are managed Major change: MAPREDUCE-1100/MAPREDUCE-1176: unresolved on trunk, though probably will be attacked by different JIRAs Major new ops-visible feature: "metrics2" system Major new ops-visible feature: MAPREDUCE-291 job history can be viewed from a separate server Major new set of user-visible configurations: MAPREDUCE-1943 and friends which implement new limits in MapReduce (eg MAPREDUCE-1872 as well) I have code to work on, so I won't keep going, but this is from looking at the last couple months of 203. -Todd -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 15:06, Suresh Srinivas wrote: > Eli, > > How many of these patches that you find troublesome are in CDH already? How is that relevant to the release vote and discrepancies listed in Eli's email? > Regards, > Suresh > > > On 5/4/11 3:03 PM, "Eli Collins" wrote: > >> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: >>> Here's an updated release candidate for 0.20.203.0. I've incorporated the >>> feedback and included all of the patches from 0.20.2, which is the last >>> stable release. I also fixed the eclipse-plugin problem. >>> >>> The candidate is at: >>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ >>> >>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1. >>> >>> -- Owen >> >> While rc2 is an improvement on rc1, I am -1 on this particular rc. >> Rationale: >> >> This rc contains many patches not yet committed to trunk. This would >> cause the next major release (0.22) to be a feature regression against >> our latest stable release (203), were 0.22 released soon. >> >> This rc contains many patches not yet reviewed by the community via >> the normal process (jira, patch against trunk, merge to a release >> branch). I think we should respect the existing community process that >> has been used for all previous releases. >> >> This rc introduces a new development and braching model (new feature >> development outside trunk) and Hadoop versioning scheme without >> sufficient discussion or proposal of these changes with the community. >> >> We should establish new process before the release, a release is not >> the appropriate mechanism for changing our review and development >> process or versioning . >> >> I do support a release from branch-0.20-security that follows the >> existing, established community process. >> >> Thanks, >> Eli > >
Re: [VOTE] Release candidate 0.20.203.0-rc1
+1 for the release. I downloaded the release, verified checksums, built and deployed. Ran randomwriter jobs on it. Everything passes. -- thanks mahadev @mahadevkonar On Wed, May 4, 2011 at 3:05 PM, Arun C Murthy wrote: > On May 4, 2011, at 10:31 AM, Owen O'Malley wrote: > >> Here's an updated release candidate for 0.20.203.0. I've incorporated the >> feedback and included all of the patches from 0.20.2, which is the last >> stable release. I also fixed the eclipse-plugin problem. >> >> The candidate is at: >> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ >> >> Please download it, inspect it, compile it, and test it. Clearly, I'm +1. >> > > +1 > > Downloaded release, checked checksums, built, deployed single-node cluster. > > Arun > >
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 1:17 PM, Allen Wittenauer wrote: > Am I misreading this, or are the MR protocols out of sync between > 0.20.203 and 0.21? It would also appear that this is marked stable in 0.21. > What is the user impact? The names of the protocols were changed, but the names of the protocols aren't user-facing. The protocols themselves also changed, as with all Hadoop major versions. (We need to switch to protobuf or something for RPC to provide wire compatibility.) -- Owen
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: The list seems highly inaccurate. Checked the first few N/A items. All are false positives. Also, can you please provide a list on features which are not related to gridmix benchmarks or herriot tests? Please remember, and I have said this on list and off-list, that many of the forward ports obviated the need for multiple patches which show up in the commit logs. thanks, Arun < HADOOP-6304 N/A -- fixed in trunk via HADOOP-7110 (Todd, it was fixed by you. Forgot?) < HADOOP-6598 N/A -- moved to HADOOP-6763 and committed to trunk < HADOOP-6653 N/A -- not applicable in trunk < HADOOP-6716 N/A -- as part of HADOOP-6815 which was committed to trunk < HADOOP-6718 N/A -- Incorporated in HADOOP-6706 for 0.22. < HADOOP-6776 N/A -- Tom White said "This is fixed in trunk, so can be closed." Regards, Nicholas From: Eli Collins To: general@hadoop.apache.org Sent: Wed, May 4, 2011 3:36:16 PM Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1 On Wed, May 4, 2011 at 3:29 PM, Jakob Homan wrote: @Eli >> This rc contains many patches not yet committed to trunk. If you've compiled this list, can you post it? Here's the list Todd posted yesterday: http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-no8z6pyhf...@mail.gmail.com%3E Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 4:09 PM, Tsz Wo (Nicholas), Sze wrote: > The list seems highly inaccurate. Checked the first few N/A items. All are > false positives. Yes, that's why those are marked N/A ie "Not applicable". Check out the non N/A ones. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
The list seems highly inaccurate. Checked the first few N/A items. All are false positives. < HADOOP-6304 N/A -- fixed in trunk via HADOOP-7110 (Todd, it was fixed by you. Forgot?) < HADOOP-6598 N/A -- moved to HADOOP-6763 and committed to trunk < HADOOP-6653 N/A -- not applicable in trunk < HADOOP-6716 N/A -- as part of HADOOP-6815 which was committed to trunk < HADOOP-6718 N/A -- Incorporated in HADOOP-6706 for 0.22. < HADOOP-6776 N/A -- Tom White said "This is fixed in trunk, so can be closed." Regards, Nicholas From: Eli Collins To: general@hadoop.apache.org Sent: Wed, May 4, 2011 3:36:16 PM Subject: Re: [VOTE] Release candidate 0.20.203.0-rc1 On Wed, May 4, 2011 at 3:29 PM, Jakob Homan wrote: > @Eli >> This rc contains many patches not yet committed to trunk. > If you've compiled this list, can you post it? > Here's the list Todd posted yesterday: http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-no8z6pyhf...@mail.gmail.com%3E Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
> Your -1 vote essentially blocks the changes that are already available in > CDH to be available from Apache open source! As Eric mentioned, this thread is about an Apache release, not CDH. My -1 vote does not block these changes from being released via Apache. You can not veto a release. Releases are lazy majority, the release is only blocked if there are more -1 votes than +1 votes. If these changes are contributed on jira, discussed and reviewed, and committed to trunk I'm happy to support the release. There's a big difference between asking that a release respect the Apache community process and blocking it. If you want to get the release out how about contributing the work via the normal means so the community can review it like we review all other code changes. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
Here is a snippet from your blog - http://www.cloudera.com/blog/2010/10/cdh3-beta-3-now-available/ -- Security Enhancements As one of the primary contributors and largest production users of Hadoop, Yahoo! publishes the source tree for the version of Hadoop that they run on their production clusters. We are pleased to announce that we have merged Yahoo¹s source tree into CDH3b3. This merge brings many improvements developed at Yahoo! into CDH, including improvements for MapReduce scalability on 1000+-node clusters and several new tools for benchmarking and testing Hadoop. -- It would be great, if you can list how many of 192 changes were reviewed and became part of CDH. Your -1 vote essentially blocks the changes that are already available in CDH to be available from Apache open source! On 5/4/11 3:30 PM, "Todd Lipcon" wrote: > With Cloudera hat on, I agree with Eli's assessment. > > With Apache hat on, I don't see how this is at all relevant to the task at > hand. I would make the same arguments against taking CDH3 and releasing it > as an ASF artifact -- we'd also have a certain amount of work to do to make > sure that all of the patches are in trunk, first. Additionally, I'd want to > outline what the inclusion criteria would be for that branch. > > -Todd > > On Wed, May 4, 2011 at 3:24 PM, Eli Collins wrote: > >> With my Cloudera hat on.. >> >> When we went through the 10x and 20x patches we only pulled a subset >> of them, primarily for security and the general improvements that we >> thought were good. We found both incompatible changes and some >> sketchy changes that we did not pull in from a quality perspective. >> There is a big difference between a patch set that's acceptable for >> Yahoo!'s user base and one that's a more general artifact. >> >> When we evaluated the YDH patch sets we were using that frame of mind. >> I'm now looking it in terms of an Apache release. And the place to >> review changes for an Apache release is on jira. >> >> CDH3 is based on the latest stable Apache release (20.2) so it doesn't >> regress against it. I'm nervous about rebasing future releases on 203 >> because of the compatibility and quality implications. >> >> Thanks, >> Eli >> >> >> On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas >> wrote: >>> Eli, >>> >>> How many of these patches that you find troublesome are in CDH already? >>> >>> Regards, >>> Suresh >>> >>> >>> On 5/4/11 3:03 PM, "Eli Collins" wrote: >>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley >> wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated >> the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: >> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm >> +1. > > -- Owen While rc2 is an improvement on rc1, I am -1 on this particular rc. >> Rationale: This rc contains many patches not yet committed to trunk. This would cause the next major release (0.22) to be a feature regression against our latest stable release (203), were 0.22 released soon. This rc contains many patches not yet reviewed by the community via the normal process (jira, patch against trunk, merge to a release branch). I think we should respect the existing community process that has been used for all previous releases. This rc introduces a new development and braching model (new feature development outside trunk) and Hadoop versioning scheme without sufficient discussion or proposal of these changes with the community. We should establish new process before the release, a release is not the appropriate mechanism for changing our review and development process or versioning . I do support a release from branch-0.20-security that follows the existing, established community process. Thanks, Eli >>> >>> >> > >
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 3:29 PM, Jakob Homan wrote: > @Eli >> This rc contains many patches not yet committed to trunk. > If you've compiled this list, can you post it? > Here's the list Todd posted yesterday: http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTimKKbkuPCz61TU=8-no8z6pyhf...@mail.gmail.com%3E Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
With Cloudera hat on, I agree with Eli's assessment. With Apache hat on, I don't see how this is at all relevant to the task at hand. I would make the same arguments against taking CDH3 and releasing it as an ASF artifact -- we'd also have a certain amount of work to do to make sure that all of the patches are in trunk, first. Additionally, I'd want to outline what the inclusion criteria would be for that branch. -Todd On Wed, May 4, 2011 at 3:24 PM, Eli Collins wrote: > With my Cloudera hat on.. > > When we went through the 10x and 20x patches we only pulled a subset > of them, primarily for security and the general improvements that we > thought were good. We found both incompatible changes and some > sketchy changes that we did not pull in from a quality perspective. > There is a big difference between a patch set that's acceptable for > Yahoo!'s user base and one that's a more general artifact. > > When we evaluated the YDH patch sets we were using that frame of mind. > I'm now looking it in terms of an Apache release. And the place to > review changes for an Apache release is on jira. > > CDH3 is based on the latest stable Apache release (20.2) so it doesn't > regress against it. I'm nervous about rebasing future releases on 203 > because of the compatibility and quality implications. > > Thanks, > Eli > > > On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas > wrote: > > Eli, > > > > How many of these patches that you find troublesome are in CDH already? > > > > Regards, > > Suresh > > > > > > On 5/4/11 3:03 PM, "Eli Collins" wrote: > > > >> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley > wrote: > >>> Here's an updated release candidate for 0.20.203.0. I've incorporated > the > >>> feedback and included all of the patches from 0.20.2, which is the last > >>> stable release. I also fixed the eclipse-plugin problem. > >>> > >>> The candidate is at: > http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > >>> > >>> Please download it, inspect it, compile it, and test it. Clearly, I'm > +1. > >>> > >>> -- Owen > >> > >> While rc2 is an improvement on rc1, I am -1 on this particular rc. > Rationale: > >> > >> This rc contains many patches not yet committed to trunk. This would > >> cause the next major release (0.22) to be a feature regression against > >> our latest stable release (203), were 0.22 released soon. > >> > >> This rc contains many patches not yet reviewed by the community via > >> the normal process (jira, patch against trunk, merge to a release > >> branch). I think we should respect the existing community process that > >> has been used for all previous releases. > >> > >> This rc introduces a new development and braching model (new feature > >> development outside trunk) and Hadoop versioning scheme without > >> sufficient discussion or proposal of these changes with the community. > >> > >> We should establish new process before the release, a release is not > >> the appropriate mechanism for changing our review and development > >> process or versioning . > >> > >> I do support a release from branch-0.20-security that follows the > >> existing, established community process. > >> > >> Thanks, > >> Eli > > > > > -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
@Eli >> This rc contains many patches not yet committed to trunk. If you've compiled this list, can you post it? On Wed, May 4, 2011 at 3:24 PM, Eli Collins wrote: > With my Cloudera hat on.. > > When we went through the 10x and 20x patches we only pulled a subset > of them, primarily for security and the general improvements that we > thought were good. We found both incompatible changes and some > sketchy changes that we did not pull in from a quality perspective. > There is a big difference between a patch set that's acceptable for > Yahoo!'s user base and one that's a more general artifact. > > When we evaluated the YDH patch sets we were using that frame of mind. > I'm now looking it in terms of an Apache release. And the place to > review changes for an Apache release is on jira. > > CDH3 is based on the latest stable Apache release (20.2) so it doesn't > regress against it. I'm nervous about rebasing future releases on 203 > because of the compatibility and quality implications. > > Thanks, > Eli > > > On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas > wrote: >> Eli, >> >> How many of these patches that you find troublesome are in CDH already? >> >> Regards, >> Suresh >> >> >> On 5/4/11 3:03 PM, "Eli Collins" wrote: >> >>> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen >>> >>> While rc2 is an improvement on rc1, I am -1 on this particular rc. >>> Rationale: >>> >>> This rc contains many patches not yet committed to trunk. This would >>> cause the next major release (0.22) to be a feature regression against >>> our latest stable release (203), were 0.22 released soon. >>> >>> This rc contains many patches not yet reviewed by the community via >>> the normal process (jira, patch against trunk, merge to a release >>> branch). I think we should respect the existing community process that >>> has been used for all previous releases. >>> >>> This rc introduces a new development and braching model (new feature >>> development outside trunk) and Hadoop versioning scheme without >>> sufficient discussion or proposal of these changes with the community. >>> >>> We should establish new process before the release, a release is not >>> the appropriate mechanism for changing our review and development >>> process or versioning . >>> >>> I do support a release from branch-0.20-security that follows the >>> existing, established community process. >>> >>> Thanks, >>> Eli >> >> >
Re: [VOTE] Release candidate 0.20.203.0-rc1
-1 for the same reasons I outlined in my email yesterday. This is not a community artifact following the community's processes, and thus should not be an official release until those issues are addressed. On Wed, May 4, 2011 at 3:17 PM, Doug Cutting wrote: > -1 > > This candidate has lots of patches that are not in trunk, potentially > adding regressions to 0.22 and 0.23. This should be addressed before we > release from 0.20-security. We should also not move to four-component > version numbering. A release from the 0.20-security branch should > perhaps be called 0.20.100. > > Doug > > On 05/04/2011 10:31 AM, Owen O'Malley wrote: > > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > > > The candidate is at: > http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > > > -- Owen > -- Todd Lipcon Software Engineer, Cloudera
Re: [VOTE] Release candidate 0.20.203.0-rc1
With my Cloudera hat on.. When we went through the 10x and 20x patches we only pulled a subset of them, primarily for security and the general improvements that we thought were good. We found both incompatible changes and some sketchy changes that we did not pull in from a quality perspective. There is a big difference between a patch set that's acceptable for Yahoo!'s user base and one that's a more general artifact. When we evaluated the YDH patch sets we were using that frame of mind. I'm now looking it in terms of an Apache release. And the place to review changes for an Apache release is on jira. CDH3 is based on the latest stable Apache release (20.2) so it doesn't regress against it. I'm nervous about rebasing future releases on 203 because of the compatibility and quality implications. Thanks, Eli On Wed, May 4, 2011 at 3:06 PM, Suresh Srinivas wrote: > Eli, > > How many of these patches that you find troublesome are in CDH already? > > Regards, > Suresh > > > On 5/4/11 3:03 PM, "Eli Collins" wrote: > >> On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: >>> Here's an updated release candidate for 0.20.203.0. I've incorporated the >>> feedback and included all of the patches from 0.20.2, which is the last >>> stable release. I also fixed the eclipse-plugin problem. >>> >>> The candidate is at: >>> http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ >>> >>> Please download it, inspect it, compile it, and test it. Clearly, I'm +1. >>> >>> -- Owen >> >> While rc2 is an improvement on rc1, I am -1 on this particular rc. >> Rationale: >> >> This rc contains many patches not yet committed to trunk. This would >> cause the next major release (0.22) to be a feature regression against >> our latest stable release (203), were 0.22 released soon. >> >> This rc contains many patches not yet reviewed by the community via >> the normal process (jira, patch against trunk, merge to a release >> branch). I think we should respect the existing community process that >> has been used for all previous releases. >> >> This rc introduces a new development and braching model (new feature >> development outside trunk) and Hadoop versioning scheme without >> sufficient discussion or proposal of these changes with the community. >> >> We should establish new process before the release, a release is not >> the appropriate mechanism for changing our review and development >> process or versioning . >> >> I do support a release from branch-0.20-security that follows the >> existing, established community process. >> >> Thanks, >> Eli > >
Re: [VOTE] Release candidate 0.20.203.0-rc1
-1 This candidate has lots of patches that are not in trunk, potentially adding regressions to 0.22 and 0.23. This should be addressed before we release from 0.20-security. We should also not move to four-component version numbering. A release from the 0.20-security branch should perhaps be called 0.20.100. Doug On 05/04/2011 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > -- Owen
Re: [VOTE] Release candidate 0.20.203.0-rc1
Eli, How many of these patches that you find troublesome are in CDH already? Regards, Suresh On 5/4/11 3:03 PM, "Eli Collins" wrote: > On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: >> Here's an updated release candidate for 0.20.203.0. I've incorporated the >> feedback and included all of the patches from 0.20.2, which is the last >> stable release. I also fixed the eclipse-plugin problem. >> >> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ >> >> Please download it, inspect it, compile it, and test it. Clearly, I'm +1. >> >> -- Owen > > While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale: > > This rc contains many patches not yet committed to trunk. This would > cause the next major release (0.22) to be a feature regression against > our latest stable release (203), were 0.22 released soon. > > This rc contains many patches not yet reviewed by the community via > the normal process (jira, patch against trunk, merge to a release > branch). I think we should respect the existing community process that > has been used for all previous releases. > > This rc introduces a new development and braching model (new feature > development outside trunk) and Hadoop versioning scheme without > sufficient discussion or proposal of these changes with the community. > > We should establish new process before the release, a release is not > the appropriate mechanism for changing our review and development > process or versioning . > > I do support a release from branch-0.20-security that follows the > existing, established community process. > > Thanks, > Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 10:31 AM, Owen O'Malley wrote: Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse- plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. +1 Downloaded release, checked checksums, built, deployed single-node cluster. Arun
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > -- Owen While rc2 is an improvement on rc1, I am -1 on this particular rc. Rationale: This rc contains many patches not yet committed to trunk. This would cause the next major release (0.22) to be a feature regression against our latest stable release (203), were 0.22 released soon. This rc contains many patches not yet reviewed by the community via the normal process (jira, patch against trunk, merge to a release branch). I think we should respect the existing community process that has been used for all previous releases. This rc introduces a new development and braching model (new feature development outside trunk) and Hadoop versioning scheme without sufficient discussion or proposal of these changes with the community. We should establish new process before the release, a release is not the appropriate mechanism for changing our review and development process or versioning . I do support a release from branch-0.20-security that follows the existing, established community process. Thanks, Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
Hi Folks, This is a release vote, let's stay focused. On this thread I think appropriate responses are either +1 and some short commentary (assuming you've tried it and it works) or -1 and some short commentary. It would also be cool if you noted if you've tried it. In the spirit of my feedback, I'll respond to this under another subject. Thanks, E14 On May 4, 2011, at 12:17 PM, Eli Collins wrote: > On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: >> Here's an updated release candidate for 0.20.203.0. I've incorporated the >> feedback and included all of the patches from 0.20.2, which is the last >> stable release. I also fixed the eclipse-plugin problem. >> >> The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ >> >> Please download it, inspect it, compile it, and test it. Clearly, I'm +1. >> >> -- Owen > > Hey Owen, > > Thanks for incorporating all the feedback and additional changes. It's > great that this release won't be a regression against our previous > stable release. > > I would like to call out that we are not just voting to adopt a > particular release, we are starting a new version scheme for the > project, doing new feature development on maintenance release branches > (before trunk), and we're saying it's OK to release software that > hasn't been reviewed by the community. > > I'd like to hear from our development community not just that we want > to do a release from this branch but that we want to adopt these other > changes as well. Here's a summary of the major *remaining* issues and > a recommendation on how to proceed: > > 1. There are about ~50 changes that have jiras that are committed to > the branch that are not yet in trunk. The next release (0.22) will be > a regression against this release, with respect to these particular > changes. Recomendation: we should get these changes in trunk before > releasing so that new features do not show up in maintenace branches > first. > > 2. There are 192 patches that were committed to the branch without > reference to any Jira in the commit message. Some of these may have > already been forward ported, but it is very difficult to match them up > and evaluate which ones have been committed. Some are troublesome, > when spot checking the commits I found some that have been done by > non-committers with no public review that introduced an apparent > performance regressions (eg see HADOOP-7255). Recommendation: we > should update the commit log to make sure there is a jira for each > issue, and all changes have been reviewed/committed. This is the way > we've always done releases. > > 3. The new versioning scheme major.minor.point.X the new "X" component > allows for new feature development on point releases. Recomendation: > we should discuss in a separate thread whether we want to do new > feature development on maintenance branches and if so to adopt this > new version scheme. > > Thanks, > Eli
Re: [VOTE] Release candidate 0.20.203.0-rc1
On May 4, 2011, at 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. Am I misreading this, or are the MR protocols out of sync between 0.20.203 and 0.21? It would also appear that this is marked stable in 0.21. What is the user impact?
Re: [VOTE] Release candidate 0.20.203.0-rc1
On Wed, May 4, 2011 at 10:31 AM, Owen O'Malley wrote: > Here's an updated release candidate for 0.20.203.0. I've incorporated the > feedback and included all of the patches from 0.20.2, which is the last > stable release. I also fixed the eclipse-plugin problem. > > The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ > > Please download it, inspect it, compile it, and test it. Clearly, I'm +1. > > -- Owen Hey Owen, Thanks for incorporating all the feedback and additional changes. It's great that this release won't be a regression against our previous stable release. I would like to call out that we are not just voting to adopt a particular release, we are starting a new version scheme for the project, doing new feature development on maintenance release branches (before trunk), and we're saying it's OK to release software that hasn't been reviewed by the community. I'd like to hear from our development community not just that we want to do a release from this branch but that we want to adopt these other changes as well. Here's a summary of the major *remaining* issues and a recommendation on how to proceed: 1. There are about ~50 changes that have jiras that are committed to the branch that are not yet in trunk. The next release (0.22) will be a regression against this release, with respect to these particular changes. Recomendation: we should get these changes in trunk before releasing so that new features do not show up in maintenace branches first. 2. There are 192 patches that were committed to the branch without reference to any Jira in the commit message. Some of these may have already been forward ported, but it is very difficult to match them up and evaluate which ones have been committed. Some are troublesome, when spot checking the commits I found some that have been done by non-committers with no public review that introduced an apparent performance regressions (eg see HADOOP-7255). Recommendation: we should update the commit log to make sure there is a jira for each issue, and all changes have been reviewed/committed. This is the way we've always done releases. 3. The new versioning scheme major.minor.point.X the new "X" component allows for new feature development on point releases. Recomendation: we should discuss in a separate thread whether we want to do new feature development on maintenance branches and if so to adopt this new version scheme. Thanks, Eli
[VOTE] Release candidate 0.20.203.0-rc1
Here's an updated release candidate for 0.20.203.0. I've incorporated the feedback and included all of the patches from 0.20.2, which is the last stable release. I also fixed the eclipse-plugin problem. The candidate is at: http://people.apache.org/~omalley/hadoop-0.20.203.0-rc1/ Please download it, inspect it, compile it, and test it. Clearly, I'm +1. -- Owen