Re: how use only a reducer without a mapper

Jason Venner Wed, 27 Aug 2008 09:28:14 -0700

The down side of this (which appears to be the only way) is that yourentire input data set has to pass through the identity mapper and thengo through shuffle and sort before it gets to the reducer.If you have a large input data set, this takes real resources - cpu,disk, network and wall clock time.

What we have been doing is making map files of our data sets, andrunning the Join code on them, then we have reduce equivalent capabilityin the mapper.


Richard Tomsett wrote:

Leandro Alvim wrote:
How can i use only a reduce without map?
I don't know if there's a way to run just a reduce task without a mapstage, but you could do it by having a map stage just using theIdentityMapper class (which passes the data through to the reducersunchanged), so effectively just doing a reduce.

--
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>

Attributor is hiring Hadoop Wranglers and coding wizards, contact ifinterested

Re: how use only a reducer without a mapper

Reply via email to