Re: Proposal: Further Project Split(s)

2011-04-01 Thread Todd Lipcon
+4.01. This is a terrific idea.

On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:

 Hello Hadoop Community,

 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.

 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease
 out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given
 high
 priority.

 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community
 regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.

 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My
 gut
 instinct is that we should split HDFS into HD and FS sub-projects,
 and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.

 Please let me know what you think.

 Best,
 Aaron




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Proposal: Further Project Split(s)

2011-04-01 Thread Chris Douglas
Experience developing Hadoop has shown that we not only need to
partition our projects for more active releases, but we also should
explore speculative project splits. For this, a Hadoop.next() project
should track the development of a project scheduler that can partition
the Hadoop subprojects, possibly running a second version of a
subproject in parallel. Downstream subprojects and TLPs automatically
accept whichever releases first as a dependency. Implementation should
combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
be written).

Of course, not all of these subprojects will succeed. When one fails
(or is too slow with its project reports), the project scheduler will
be responsible for respawning it in the Incubator.

The project scheduler will, of course, be pluggable. -C

On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:
 Hello Hadoop Community,

 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.

 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given high
 priority.

 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.

 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My gut
 instinct is that we should split HDFS into HD and FS sub-projects, and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.

 Please let me know what you think.

 Best,
 Aaron



Re: Proposal: Further Project Split(s)

2011-04-01 Thread Nigel Daley
-1+2.  This could potentially allow us to replace Jenkins with Hadoop for our 
build and test infrastructure.  That would be awesome!

n.

On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

 Experience developing Hadoop has shown that we not only need to
 partition our projects for more active releases, but we also should
 explore speculative project splits. For this, a Hadoop.next() project
 should track the development of a project scheduler that can partition
 the Hadoop subprojects, possibly running a second version of a
 subproject in parallel. Downstream subprojects and TLPs automatically
 accept whichever releases first as a dependency. Implementation should
 combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
 be written).
 
 Of course, not all of these subprojects will succeed. When one fails
 (or is too slow with its project reports), the project scheduler will
 be responsible for respawning it in the Incubator.
 
 The project scheduler will, of course, be pluggable. -C
 
 On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:
 Hello Hadoop Community,
 
 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.
 
 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given high
 priority.
 
 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.
 
 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My gut
 instinct is that we should split HDFS into HD and FS sub-projects, and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.
 
 Please let me know what you think.
 
 Best,
 Aaron
 



Re: Proposal: Further Project Split(s)

2011-04-01 Thread Patrick Angeles
+1

This will allow Hadoop to better compete with GoDaddy's Hadoop Killer
skunkworks project.

On Fri, Apr 1, 2011 at 11:26 AM, Nigel Daley nda...@mac.com wrote:

 -1+2.  This could potentially allow us to replace Jenkins with Hadoop for
 our build and test infrastructure.  That would be awesome!

 n.

 On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

  Experience developing Hadoop has shown that we not only need to
  partition our projects for more active releases, but we also should
  explore speculative project splits. For this, a Hadoop.next() project
  should track the development of a project scheduler that can partition
  the Hadoop subprojects, possibly running a second version of a
  subproject in parallel. Downstream subprojects and TLPs automatically
  accept whichever releases first as a dependency. Implementation should
  combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
  be written).
 
  Of course, not all of these subprojects will succeed. When one fails
  (or is too slow with its project reports), the project scheduler will
  be responsible for respawning it in the Incubator.
 
  The project scheduler will, of course, be pluggable. -C
 
  On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:
  Hello Hadoop Community,
 
  Given the tremendous positive feedback we've all had regarding the HDFS,
  MapReduce, and Common project split, I'd like to propose we take the
 next
  step and further separate the existing projects.
 
  I propose we begin by splitting the MapReduce project into separate
 Map
  and Reduce sub-projects. This will provide us the opportunity to tease
 out
  the complex interdependencies between map and reduce that exist
 today,
  to encourage us to write more modular and isolated code, which should
 speed
  releases. This will also aid our users who exclusively run map-only or
  reduce-only jobs. These are important use-cases, and so should be given
 high
  priority.
 
  Given that these two portions of the existing MapReduce project share a
  great deal of code, we will likely need to release these two new
 projects
  concurrently at first, but the eventual goal should certainly be to be
 able
  to release Map and Reduce independently. This seems intuitive to me,
  given the remarkable recent advancements in the academic community
 regarding
  reduce, while the research coming out of the map academics has
 largely
  stagnated of late.
 
  If this proposal is accepted, and it has the success I think it will,
 then
  we should strongly consider splitting the other two projects as well. My
 gut
  instinct is that we should split HDFS into HD and FS sub-projects,
 and
  simply rename the Common project to C'Mon. We can think about the
  details of what exactly these project splits mean later.
 
  Please let me know what you think.
 
  Best,
  Aaron
 




Re: Proposal: Further Project Split(s)

2011-04-01 Thread Mattmann, Chris A (388J)
LOL@Chris!!!

On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

 Experience developing Hadoop has shown that we not only need to
 partition our projects for more active releases, but we also should
 explore speculative project splits. For this, a Hadoop.next() project
 should track the development of a project scheduler that can partition
 the Hadoop subprojects, possibly running a second version of a
 subproject in parallel. Downstream subprojects and TLPs automatically
 accept whichever releases first as a dependency. Implementation should
 combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
 be written).
 
 Of course, not all of these subprojects will succeed. When one fails
 (or is too slow with its project reports), the project scheduler will
 be responsible for respawning it in the Incubator.
 
 The project scheduler will, of course, be pluggable. -C
 
 On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:
 Hello Hadoop Community,
 
 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.
 
 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given high
 priority.
 
 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.
 
 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My gut
 instinct is that we should split HDFS into HD and FS sub-projects, and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.
 
 Please let me know what you think.
 
 Best,
 Aaron
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Proposal: Further Project Split(s)

2011-04-01 Thread Brian Bockelman

On Apr 1, 2011, at 12:40 PM, Allen Wittenauer wrote:

 
 On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:
 
 Experience developing Hadoop has shown that we not only need to
 partition our projects for more active releases, but we also should
 explore speculative project splits. For this, a Hadoop.next() project
 should track the development of a project scheduler that can partition
 the Hadoop subprojects, possibly running a second version of a
 subproject in parallel. Downstream subprojects and TLPs automatically
 accept whichever releases first as a dependency. Implementation should
 combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
 be written).
 
 
 -1, until it supports eclipse.
 

-1, until it supports emacs

smime.p7s
Description: S/MIME cryptographic signature


Re: Proposal: Further Project Split(s)

2011-04-01 Thread Amr Awadallah
Strong -1 from me, this *idiotic* since we first need to split the NN and DN
into separate projects.

-- amr

On Fri, Apr 1, 2011 at 10:40 AM, Allen Wittenauer
awittena...@linkedin.comwrote:


 On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

  Experience developing Hadoop has shown that we not only need to
  partition our projects for more active releases, but we also should
  explore speculative project splits. For this, a Hadoop.next() project
  should track the development of a project scheduler that can partition
  the Hadoop subprojects, possibly running a second version of a
  subproject in parallel. Downstream subprojects and TLPs automatically
  accept whichever releases first as a dependency. Implementation should
  combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
  be written).


 -1, until it supports eclipse.





Re: Proposal: Further Project Split(s)

2011-04-01 Thread Konstantin Boudnik
On Fri, Apr 1, 2011 at 08:26, Nigel Daley nda...@mac.com wrote:
 -1+2.  This could potentially allow us to replace Jenkins with Hadoop for our 
 build and test infrastructure.  That would be awesome!

Has anyone checked a calendar lately?

 On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

 Experience developing Hadoop has shown that we not only need to
 partition our projects for more active releases, but we also should
 explore speculative project splits. For this, a Hadoop.next() project
 should track the development of a project scheduler that can partition
 the Hadoop subprojects, possibly running a second version of a
 subproject in parallel. Downstream subprojects and TLPs automatically
 accept whichever releases first as a dependency. Implementation should
 combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
 be written).

 Of course, not all of these subprojects will succeed. When one fails
 (or is too slow with its project reports), the project scheduler will
 be responsible for respawning it in the Incubator.

 The project scheduler will, of course, be pluggable. -C

 On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers a...@cloudera.com wrote:
 Hello Hadoop Community,

 Given the tremendous positive feedback we've all had regarding the HDFS,
 MapReduce, and Common project split, I'd like to propose we take the next
 step and further separate the existing projects.

 I propose we begin by splitting the MapReduce project into separate Map
 and Reduce sub-projects. This will provide us the opportunity to tease out
 the complex interdependencies between map and reduce that exist today,
 to encourage us to write more modular and isolated code, which should speed
 releases. This will also aid our users who exclusively run map-only or
 reduce-only jobs. These are important use-cases, and so should be given high
 priority.

 Given that these two portions of the existing MapReduce project share a
 great deal of code, we will likely need to release these two new projects
 concurrently at first, but the eventual goal should certainly be to be able
 to release Map and Reduce independently. This seems intuitive to me,
 given the remarkable recent advancements in the academic community regarding
 reduce, while the research coming out of the map academics has largely
 stagnated of late.

 If this proposal is accepted, and it has the success I think it will, then
 we should strongly consider splitting the other two projects as well. My gut
 instinct is that we should split HDFS into HD and FS sub-projects, and
 simply rename the Common project to C'Mon. We can think about the
 details of what exactly these project splits mean later.

 Please let me know what you think.

 Best,
 Aaron





Re: Proposal: Further Project Split(s)

2011-04-01 Thread Konstantin Boudnik
And I tend to believe to all sort of stuff on this particular day
because this happens to be my birthday ;(

On Fri, Apr 1, 2011 at 12:35, Allen Wittenauer a...@apache.org wrote:

 On Apr 1, 2011, at 11:41 AM, Konstantin Boudnik wrote:

 On Fri, Apr 1, 2011 at 08:26, Nigel Daley nda...@mac.com wrote:
 -1+2.  This could potentially allow us to replace Jenkins with Hadoop for 
 our build and test infrastructure.  That would be awesome!

 Has anyone checked a calendar lately?


        No.  My calendar application's map tasks are stuck behind our PYMK 
 workflow.