Eric Baldeschwieler wrote:
I can not think of a case where this proposed extension complicates code or reduces compressibility. Since it is backwards compatible with your desired API, purists can simply ignore the option.
It makes the insertion of a combiner no longer transparent. The reducer would have to know whether a combiner had been used in order to know how to process the map output.
In general this seems like a micro-optimization. It saves little code. Instead of writing 'collector.collect(key, new List(value))' one could write 'collector.collect(key, value)'.
Taking this to its logical extreme, in the classic word-count use of MapReduce, why should one have to emit ones for the map values? Why have a value at all? Why not add a collect(key) method, then permit reducers to be passed an iterator which returns null for all values where collect(key) was called. That would save a little code and make the intermediate data a bit smaller. So should we do it? I'd argue not.
Doug
