Re: Passing TupleWritable between map and reduce

Chris Douglas Thu, 07 Aug 2008 16:39:40 -0700

You need access to TupleWritable::setWritten(int). If you want to useTupleWritable outside the join package, then you need to make this(and probably related methods, like clearWritten(int)) public andrecompile.


Please file a JIRA if you think it should be more general. -C


On Aug 7, 2008, at 4:18 PM, Michael Andrews wrote:

Hi,
I am a new hadoop developer and am struggling to understand why Icannot pass TupleWritable between a map and reduce function. I havemodified the wordcount example to demonstrate the issue. Also I amusing hadoop 0.17.1.
package wordcount; import java.io.IOException; import java.util.*;import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*; importorg.apache.hadoop.mapred.join.*; public class WordCount { publicstatic class Map extends MapReduceBase implementsMapper<LongWritable, Text, Text, TupleWritable> { privatefinal static IntWritable one = new IntWritable(1); privateText word = new Text(); public void map(LongWritable key,Text value, OutputCollector<Text, TupleWritable> output, Reporterreporter) throws IOException { String line =value.toString(); StringTokenizer tokenizer = newStringTokenizer(line); TupleWritable tuple = newTupleWritable(new Writable[] { one } ); while(tokenizer.hasMoreTokens()){ word.set(tokenizer.nextToken());output.collect(word, tuple); } } } publicstatic class Reduce extends MapReduceBase implements Reducer<Text,TupleWritable, Text, TupleWritable> { public void reduce(Textkey, Iterator<TupleWritable> values, OutputCollector<Text,TupleWritable> output, Reporter reporter) throws IOException{ IntWritable i = new IntWritable(); int sum =0; while (values.hasNext()) { i =((IntWritable) values.next().get(0)); sum +=i.get(); } TupleWritable tuple = newTupleWritable(new Writable[] { new IntWritable(sum) } );output.collect(key, tuple); } } public static voidmain(String[] args) throws Exception { JobConf conf = newJobConf(WordCount.class);conf.setJobName("wordcount");conf.setOutputKeyClass(Text.class);conf.setOutputValueClass(TupleWritable.class);conf.setMapperClass(Map.class);conf.setReducerClass(Reduce.class);conf.setInputFormat(TextInputFormat.class);conf.setOutputFormat(TextOutputFormat.class);FileInputFormat.setInputPaths(conf, new Path(args[0]));FileOutputFormat.setOutputPath(conf, new Path(args[1]));JobClient.runJob(conf); } }The output is always empty tuples ('[]'). Using the debugger, Ihave determined that the line:
   TupleWritable tuple = new TupleWritable(new Writable[] { one } );
Is properly constructing the desired tuple. I am not sure if it isbeing outputed correctly by output.collect as I cannot find thefield in the OutputCollector data structure. When I check in thereduce method the values are always empty tuples. I have a feelingit has something to do with this line in the JavaDoc:
TupleWritable(Writable[] vals)
Initialize tuple with storage; unknown whether any of themcontain "written" values.
Thanks in advance for any all help,

Michael

Re: Passing TupleWritable between map and reduce

Reply via email to