Re: Passing TupleWritable between map and reduce

Chris Douglas Thu, 07 Aug 2008 19:19:46 -0700

Particularly if you know which types to expect in your structureddata, rolling your own Writable is strongly preferred toTupleWritable. The latter serializes to a comically verbose format andshould only be used when the types and nesting depth are unknown. -C


On Aug 7, 2008, at 5:45 PM, Michael Andrews wrote:

OK thanks for the information. I guess it seems strange to want touse TupleWritable in this way, but this just seemed like the rightthing to do this based on the API docs. Is it more idiomatic toinherit from Writable when processing structured data? Again, I amreally new to the hadoop community but I will try to file somethingwith JIRA on this. Not really sure how to proceed with a patch,maybe I could just try and clarify the docs?


On 8/7/08 4:38 PM, "Chris Douglas" <[EMAIL PROTECTED]> wrote:

You need access to TupleWritable::setWritten(int). If you want to use
TupleWritable outside the join package, then you need to make this
(and probably related methods, like clearWritten(int)) public and
recompile.

Please file a JIRA if you think it should be more general. -C

On Aug 7, 2008, at 4:18 PM, Michael Andrews wrote:

Hi,

I am a new hadoop developer and am struggling to understand why I
cannot pass TupleWritable between a map and reduce function.  I have
modified the wordcount example to demonstrate the issue.  Also I am
using hadoop 0.17.1.

package wordcount; import java.io.IOException; import java.util.*;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*; import
org.apache.hadoop.mapred.join.*; public class WordCount {    public
static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, TupleWritable> {        private
final static IntWritable one = new IntWritable(1);        private
Text word = new Text();        public void map(LongWritable key,
Text value, OutputCollector<Text, TupleWritable> output, Reporter
reporter) throws IOException {            String line =
value.toString();            StringTokenizer tokenizer = new
StringTokenizer(line);            TupleWritable tuple = new
TupleWritable(new Writable[] { one } );            while
(tokenizer.hasMoreTokens())
{                word.set(tokenizer.nextToken());
output.collect(word, tuple);            }        }    }    public
static class Reduce extends MapReduceBase implements Reducer<Text,
TupleWritable, Text, TupleWritable> {        public void reduce(Text
key, Iterator<TupleWritable> values, OutputCollector<Text,
TupleWritable> output, Reporter reporter) throws IOException
{            IntWritable i = new IntWritable();            int sum =
0;            while (values.hasNext()) {                i =
((IntWritable) values.next().get(0));                sum +=
i.get();            }            TupleWritable tuple = new
TupleWritable(new Writable[] { new IntWritable(sum) } );
output.collect(key, tuple);        }    }    public static void
main(String[] args) throws Exception {        JobConf conf = new
JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(TupleWritable.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);    } }
The output is always empty tuples ('[]').  Using the debugger, I
have determined that the line:
  TupleWritable tuple = new TupleWritable(new Writable[] { one } );

Is properly constructing the desired tuple.  I am not sure if it is
being outputed correctly by output.collect as I cannot find the
field in the OutputCollector data structure.  When I check in the
reduce method the values are always empty tuples.  I have a feeling
it has something to do with this line in the JavaDoc:

TupleWritable(Writable[] vals)
        Initialize tuple with storage; unknown whether any of them
contain "written" values.

Thanks in advance for any all help,

Michael

Re: Passing TupleWritable between map and reduce

Reply via email to