Gets sum of all integers between map tasks
I would like to get the spam probability P(word|category) of the words from an files of category (bad/good e-mails) as describe below. BTW, To computes it on reduce, I need a sum of spamTotal between map tasks. How can i get it? Map: /** * Counts word frequency */ public void map(LongWritable key, Text value, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { String line = value.toString(); String[] tokens = line.split(splitregex); // For every word token for (int i = 0; i tokens.length; i++) { String word = tokens[i].toLowerCase(); Matcher m = wordregex.matcher(word); if (m.matches()) { spamTotal++; output.collect(new Text(word), count); } } } } Reduce: /** * Computes bad count / total bad words */ public static class Reduce extends MapReduceBase implements ReducerText, FloatWritable, Text, FloatWritable { public void reduce(Text key, IteratorFloatWritable values, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += (int) values.next().get(); } FloatWritable badProb = new FloatWritable((float) sum / spamTotal); output.collect(key, badProb); } } -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Gets sum of all integers between map tasks
this is a well known problem. basically, you want to aggregate values computed at some previous step. --emit category,probability pairs and have the reducer simply sum-up the probabilities for a given category (it is the same task as summing-up the word counts) Miles 2008/10/7 Edward J. Yoon [EMAIL PROTECTED]: I would like to get the spam probability P(word|category) of the words from an files of category (bad/good e-mails) as describe below. BTW, To computes it on reduce, I need a sum of spamTotal between map tasks. How can i get it? Map: /** * Counts word frequency */ public void map(LongWritable key, Text value, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { String line = value.toString(); String[] tokens = line.split(splitregex); // For every word token for (int i = 0; i tokens.length; i++) { String word = tokens[i].toLowerCase(); Matcher m = wordregex.matcher(word); if (m.matches()) { spamTotal++; output.collect(new Text(word), count); } } } } Reduce: /** * Computes bad count / total bad words */ public static class Reduce extends MapReduceBase implements ReducerText, FloatWritable, Text, FloatWritable { public void reduce(Text key, IteratorFloatWritable values, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += (int) values.next().get(); } FloatWritable badProb = new FloatWritable((float) sum / spamTotal); output.collect(key, badProb); } } -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Gets sum of all integers between map tasks
Oh-ha, that's simple. :) /Edward J. Yoon On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne [EMAIL PROTECTED] wrote: this is a well known problem. basically, you want to aggregate values computed at some previous step. --emit category,probability pairs and have the reducer simply sum-up the probabilities for a given category (it is the same task as summing-up the word counts) Miles 2008/10/7 Edward J. Yoon [EMAIL PROTECTED]: I would like to get the spam probability P(word|category) of the words from an files of category (bad/good e-mails) as describe below. BTW, To computes it on reduce, I need a sum of spamTotal between map tasks. How can i get it? Map: /** * Counts word frequency */ public void map(LongWritable key, Text value, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { String line = value.toString(); String[] tokens = line.split(splitregex); // For every word token for (int i = 0; i tokens.length; i++) { String word = tokens[i].toLowerCase(); Matcher m = wordregex.matcher(word); if (m.matches()) { spamTotal++; output.collect(new Text(word), count); } } } } Reduce: /** * Computes bad count / total bad words */ public static class Reduce extends MapReduceBase implements ReducerText, FloatWritable, Text, FloatWritable { public void reduce(Text key, IteratorFloatWritable values, OutputCollectorText, FloatWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += (int) values.next().get(); } FloatWritable badProb = new FloatWritable((float) sum / spamTotal); output.collect(key, badProb); } } -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org