Re: Hadoop: Multiple map reduce or some better way

Arun C Murthy Wed, 26 Mar 2008 11:20:58 -0700


On Mar 26, 2008, at 11:05 AM, Arun C Murthy wrote:

On Mar 26, 2008, at 9:39 AM, Aayush Garg wrote:
HI,
I am developing the simple inverted index program frm the hadoop.My map
function has the output:
<word, doc>
and the reducer has:
<word, list(docs)>
Now I want to use one more mapreduce to remove stop and scrubwords from
this output. Also in the next stage I would like to have short summay
associated with every word. How should I design my program fromthis stage?I mean how would I apply multiple mapreduce to this? What would bethe
better way to perform this?
In general you are better off with lesser number of Map-Reducejobs ... lesser i/o works better.

I forgot to add that you can use the apis in JobClient and JobControlto chain jobs together ...http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job+Controlhttp://hadoop.apache.org/core/docs/current/mapred_tutorial.html#JobControl


Arun

Use the DistributedCache if you can and fix your first Map to notemit the stop words at all. Use the combiner to crunch down amountof intermediate map-outputs etc.
Something useful to look at:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v2.0
Arun
Thanks,

Regards,
-
Aayush Garg,
Phone: +41 76 482 240

Re: Hadoop: Multiple map reduce or some better way

Reply via email to