Hello Hadoop Community,

Given the tremendous positive feedback we've all had regarding the HDFS,
MapReduce, and Common project split, I'd like to propose we take the next
step and further separate the existing projects.

I propose we begin by splitting the MapReduce project into separate "Map"
and "Reduce" sub-projects. This will provide us the opportunity to tease out
the complex interdependencies between "map" and "reduce" that exist today,
to encourage us to write more modular and isolated code, which should speed
releases. This will also aid our users who exclusively run map-only or
reduce-only jobs. These are important use-cases, and so should be given high
priority.

Given that these two portions of the existing MapReduce project share a
great deal of code, we will likely need to release these two new projects
concurrently at first, but the eventual goal should certainly be to be able
to release "Map" and "Reduce" independently. This seems intuitive to me,
given the remarkable recent advancements in the academic community regarding
"reduce," while the research coming out of the "map" academics has largely
stagnated of late.

If this proposal is accepted, and it has the success I think it will, then
we should strongly consider splitting the other two projects as well. My gut
instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and
simply rename the "Common" project to "C'Mon." We can think about the
details of what exactly these project splits mean later.

Please let me know what you think.

Best,
Aaron

Reply via email to