Eric has a great point.

It is pretty common to produce a set of records in map step, group them by key in reduce step and store for future use. Whenever this data is used, it is already grouped by key and essentially ready for reduce.
Special casing for this may be a useful optimization.

-- ab

On Apr 19, 2006, at 5:34 PM, Eric Baldeschwieler wrote:

might be cool to special case a reduce on sorted input.

On Apr 18, 2006, at 12:28 PM, Doug Cutting wrote:

Stefan Groschupf wrote:
what is the reason that each job that has no mapper defined runs the IdentityMapper? Handling different formats (as discussed) between mapping and reducing is difficult. Having one job that just map in the one format and having another job that just reduce in a other format would be a nice workaround of the format problem but the IdentityMapper makes this workaround impossible.

Stefan,

I don't understand the problem here. Some map function is required for any data to make it to reduce. IdentityMapper simply copies all map input without altering it. How does this cause you problems? Would you prefer a NullMapper by default, that does nothing? That would result in no output sent to reduce.

Thanks,

Doug

Reply via email to