Another benefit is that it would increase the separation of these technologies, so that, e.g., folks could more easily run different versions of mapreduce on top of different versions of HDFS. Currently we make no such guarantees. Folks would be able to upgrade to, e.g., the next release of mapreduce on a subset of their cluster without upgrading their HDFS. That's not currently supported. As we move towards splitting mapreduce into a scheduler and runtime, where folks can specify a different runtime per job, this will be even more critical.
Sounds like we simply need to create separate jar files for these different components. This can be done in the current project.
Wouldn't the amount of effort to make this split and get it right be better spent on getting all components of Hadoop to 1.0 (API stability)? The proposal feels like a distraction to me at this point in the project.
Nige
