RE: Statistics and Early Keys to Reducers

Ricky Ho Thu, 28 Oct 2010 10:03:33 -0700

Of course you can use two round of map reduce with the first round compute the 
statistics and the second round compute the percentile.


But I don't think this is better than your solution ... which is the most 
optimal one that I can think of.  Here is the pseudo code of your solution ... 



map(k1, doc) {
    for each word in doc {
        k2 = [word.length, "#"]

        emit(k2, 1)
        k2 = [word.length, word]

        emit(k2, 1)    }

}

partition(k2) {
    k2[0] % NoOfReducers

}

# key = word length, value = count

statistics = Hash.new

reduce(k2, listOfCounts) {
    if k2[1] == "#" {

        statistics[k2[0]] ++
    } else {
        wordCount = 0

        for each count in listOfCounts {
            wordCount = wordCount + count
        }
        emit(word, wordCount / statistics[k2[0]]   

}

Rgds,
Ricky

-----Original Message-----
From: Steve Lewis [mailto:[email protected]] 
Sent: Thursday, October 28, 2010 8:53 AM
To: common-user
Subject: Statistics and Early Keys to Reducers
 
Imaging I have the following problem - I want to call a standard word count
program but instead of having the reducer output the word and
its count I want it to output the word and the count / (total count of words
of that length)
 
The total count of words of a given length - say 1..100 seen by each mapper
is known at the end of the map step
 
In theory each mapper could send its total to every reducer and before the
rest of the reduce step each reducer could
compute the grand total
 
This requires
1) Statistics are sent with a key which sort ahead of all others
2) Statistics are send as the mapper is closing
3) Somehow each  mapper sends statistics with proper keys so a copy is
delivered to every reducer
 
Is this a reasonable approach - are there others
What do folks think
-- 
Steven M. Lewis PhD
4221 105th Ave Ne
Kirkland, WA 98033
206-384-1340 (cell)
Institute for Systems Biology
Seattle WA

RE: Statistics and Early Keys to Reducers

Reply via email to