Re: Proposal: Further Project Split(s)
+4.01. This is a terrific idea. On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron -- Todd Lipcon Software Engineer, Cloudera
Re: Proposal: Further Project Split(s)
Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron
Re: Proposal: Further Project Split(s)
-1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! n. On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron
Re: Proposal: Further Project Split(s)
+1 This will allow Hadoop to better compete with GoDaddy's Hadoop Killer skunkworks project. On Fri, Apr 1, 2011 at 11:26 AM, Nigel Daley nda...@mac.com wrote: -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! n. On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron
Re: Proposal: Further Project Split(s)
LOL@Chris!!! On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Proposal: Further Project Split(s)
On Apr 1, 2011, at 12:40 PM, Allen Wittenauer wrote: On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). -1, until it supports eclipse. -1, until it supports emacs smime.p7s Description: S/MIME cryptographic signature
Re: Proposal: Further Project Split(s)
Strong -1 from me, this *idiotic* since we first need to split the NN and DN into separate projects. -- amr On Fri, Apr 1, 2011 at 10:40 AM, Allen Wittenauer awittena...@linkedin.comwrote: On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). -1, until it supports eclipse.
Re: Proposal: Further Project Split(s)
On Fri, Apr 1, 2011 at 08:26, Nigel Daley nda...@mac.com wrote: -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! Has anyone checked a calendar lately? On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: Experience developing Hadoop has shown that we not only need to partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote: Hello Hadoop Community, Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate Map and Reduce sub-projects. This will provide us the opportunity to tease out the complex interdependencies between map and reduce that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release Map and Reduce independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding reduce, while the research coming out of the map academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split HDFS into HD and FS sub-projects, and simply rename the Common project to C'Mon. We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron
Re: Proposal: Further Project Split(s)
And I tend to believe to all sort of stuff on this particular day because this happens to be my birthday ;( On Fri, Apr 1, 2011 at 12:35, Allen Wittenauer a...@apache.org wrote: On Apr 1, 2011, at 11:41 AM, Konstantin Boudnik wrote: On Fri, Apr 1, 2011 at 08:26, Nigel Daley nda...@mac.com wrote: -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! Has anyone checked a calendar lately? No. My calendar application's map tasks are stuck behind our PYMK workflow.