Gets sum of all integers between map tasks

2008-10-07 Thread Edward J. Yoon
I would like to get the spam probability P(word|category) of the words
from an files of category (bad/good e-mails) as describe below. BTW,
To computes it on reduce, I need a sum of spamTotal between map
tasks. How can i get it?

Map:

/**
 * Counts word frequency
 */
public void map(LongWritable key, Text value,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  String line = value.toString();
  String[] tokens = line.split(splitregex);

  // For every word token
  for (int i = 0; i  tokens.length; i++) {
String word = tokens[i].toLowerCase();
Matcher m = wordregex.matcher(word);
if (m.matches()) {
  spamTotal++;
  output.collect(new Text(word), count);
}
  }
}
  }

Reduce:

  /**
   * Computes bad count / total bad words
   */
  public static class Reduce extends MapReduceBase implements
  ReducerText, FloatWritable, Text, FloatWritable {

public void reduce(Text key, IteratorFloatWritable values,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  int sum = 0;
  while (values.hasNext()) {
sum += (int) values.next().get();
  }

  FloatWritable badProb = new FloatWritable((float) sum / spamTotal);
  output.collect(key, badProb);
}
  }


-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: Gets sum of all integers between map tasks

2008-10-07 Thread Miles Osborne
this is a well known problem.  basically, you want to aggregate values
computed at some previous step.

--emit category,probability pairs and have the reducer simply sum-up
the probabilities for a given category

(it is the same task as summing-up the word counts)

Miles

2008/10/7 Edward J. Yoon [EMAIL PROTECTED]:
 I would like to get the spam probability P(word|category) of the words
 from an files of category (bad/good e-mails) as describe below. BTW,
 To computes it on reduce, I need a sum of spamTotal between map
 tasks. How can i get it?

 Map:

/**
 * Counts word frequency
 */
public void map(LongWritable key, Text value,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  String line = value.toString();
  String[] tokens = line.split(splitregex);

  // For every word token
  for (int i = 0; i  tokens.length; i++) {
String word = tokens[i].toLowerCase();
Matcher m = wordregex.matcher(word);
if (m.matches()) {
  spamTotal++;
  output.collect(new Text(word), count);
}
  }
}
  }

 Reduce:

  /**
   * Computes bad count / total bad words
   */
  public static class Reduce extends MapReduceBase implements
  ReducerText, FloatWritable, Text, FloatWritable {

public void reduce(Text key, IteratorFloatWritable values,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  int sum = 0;
  while (values.hasNext()) {
sum += (int) values.next().get();
  }

  FloatWritable badProb = new FloatWritable((float) sum / spamTotal);
  output.collect(key, badProb);
}
  }


 --
 Best regards, Edward J. Yoon
 [EMAIL PROTECTED]
 http://blog.udanax.org




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Gets sum of all integers between map tasks

2008-10-07 Thread Edward J. Yoon
Oh-ha, that's simple. :)

/Edward J. Yoon

On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne [EMAIL PROTECTED] wrote:
 this is a well known problem.  basically, you want to aggregate values
 computed at some previous step.

 --emit category,probability pairs and have the reducer simply sum-up
 the probabilities for a given category

 (it is the same task as summing-up the word counts)

 Miles

 2008/10/7 Edward J. Yoon [EMAIL PROTECTED]:
 I would like to get the spam probability P(word|category) of the words
 from an files of category (bad/good e-mails) as describe below. BTW,
 To computes it on reduce, I need a sum of spamTotal between map
 tasks. How can i get it?

 Map:

/**
 * Counts word frequency
 */
public void map(LongWritable key, Text value,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  String line = value.toString();
  String[] tokens = line.split(splitregex);

  // For every word token
  for (int i = 0; i  tokens.length; i++) {
String word = tokens[i].toLowerCase();
Matcher m = wordregex.matcher(word);
if (m.matches()) {
  spamTotal++;
  output.collect(new Text(word), count);
}
  }
}
  }

 Reduce:

  /**
   * Computes bad count / total bad words
   */
  public static class Reduce extends MapReduceBase implements
  ReducerText, FloatWritable, Text, FloatWritable {

public void reduce(Text key, IteratorFloatWritable values,
OutputCollectorText, FloatWritable output, Reporter reporter)
throws IOException {
  int sum = 0;
  while (values.hasNext()) {
sum += (int) values.next().get();
  }

  FloatWritable badProb = new FloatWritable((float) sum / spamTotal);
  output.collect(key, badProb);
}
  }


 --
 Best regards, Edward J. Yoon
 [EMAIL PROTECTED]
 http://blog.udanax.org




 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.




-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org