Thanks for the help Peter... Looks like the mapper is writing out to a common key and adding all the values to the HDFS The mapper(s) will just serialize over one another to write to the disc... I will be making the code for this tonight... So can you answer a tech question... Since all the values are being grouped under a common key how many reduce processes do you think will be spawned? i am thinking 1 which is bad.... But I was thinking of grouping the values and generating a key using a random number generator in the collector of the mapper. The values will now be uniformly distributed over a few keys. Say the number of keys will be 0.1% of the # of values or atleast 1, which ever is higher. So if there 20000 values 2000 odd values should be under a single key.. and 10 reducers should spawn to do the sum in parallel... Now I can atleast run 10 sum in parallel rather than just 1 reducer doing the whole work... How does that theory seem?
Peter Skomoroch wrote: > > Check out the EM example in nltk: > > http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk_contrib/hadoop/EM/runStreaming.py > > On Fri, Mar 27, 2009 at 5:19 PM, Sid123 <itis...@gmail.com> wrote: > >> >> HI, >> I have to design an iterative algorithm, each iteration is a M-R cycle >> that >> calculates a parameter and has to feed it back to all the maps in the >> next >> iteration. >> Now the reduce procedure I need to just sum eveything from the Map >> procedure(Many similar size matrices) into a single matrix(of same size >> as >> each reduce ), irrespective of the key. This single matrix is the >> parameter >> I was taking about earlier. >> i want to know. PS This parameter MUST BE global to all map processes. >> >> 1) How do I collect all the values into one single parameter? Do I need >> to >> write it to the File system or can i keep it in memory? I feel that I >> WILL >> have to write it to the HDFS somewhere... >> -- >> View this message in context: >> http://www.nabble.com/Iterative-feedback-in-map-reduce....-tp22748317p22748317.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > > -- > Peter N. Skomoroch > 617.285.8348 > http://www.datawrangling.com > http://delicious.com/pskomoroch > http://twitter.com/peteskomoroch > > -- View this message in context: http://www.nabble.com/Iterative-feedback-in-map-reduce....-tp22748317p22751900.html Sent from the Hadoop core-user mailing list archive at Nabble.com.