> Can a map task work on more than one input split? As far as I can tell from reading the code, no (at least, not yet). Code such as createCache() in JobInProgress implicitly assumes a one-to-one mapping between maps[] and splits[].
MR-1220 (small-jobs "combo task" optimization) will change that in some sense, but fundamentally, the correspondence between maps and splits is pretty well baked in, I believe. (In fact, I'm pretty sure splits are created based on some goal for the number of maps--i.e., maps and splits are one-to-one almost by definition.) I might be wrong about all this, of course. :-) Greg