Imaging I have the following problem - I want to call a standard word count program but instead of having the reducer output the word and its count I want it to output the word and the count / (total count of words of that length)
The total count of words of a given length - say 1..100 seen by each mapper is known at the end of the map step In theory each mapper could send its total to every reducer and before the rest of the reduce step each reducer could compute the grand total This requires 1) Statistics are sent with a key which sort ahead of all others 2) Statistics are send as the mapper is closing 3) Somehow each mapper sends statistics with proper keys so a copy is delivered to every reducer Is this a reasonable approach - are there others What do folks think -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA
