*HeadlineDocument *in the code below is equivalent to *MyObject* - I forgot
to obfuscate that one... opps...

On Tue, Sep 16, 2008 at 11:46 AM, Jason Grey <[EMAIL PROTECTED]>wrote:

> I'm trying to use JavaSerialization for a series of MapReduce jobs, and
> when it comes to reading a SequenceFile using SequenceFileInputFormat with
> JavaSerialized objects, something breaks down.
> I've added "org.apache.hadoop.io.serializer.JavaSerialization" to the
> io.serializations property in my config, and using native java types in my
> mapper and reducer implementations, like so:
> MyMapper implements Mapper<String,MyObject,String,MyObject>
> MyReducer implements Reducer<String,MyObject,String,MyObject>
> in my job configuration, i"m doing this:
> conf.setInputFormat(SequenceFileInputFormat.class);
> FileInputFormat.setInputPaths(conf, path1, path2);
> conf.setOutputFormat(SequenceFileOutputFormat.class);
> FileOutputFormat.setOutputPath(conf, path3);
> conf.setOutputKeyClass(String.class);
> conf.setOutputKeyComparatorClass(JavaSerializationComparator.class);
> conf.setOutputValueClass(MyObject.class);
> conf.setMapperClass(MyMapper.class);
> conf.setReducerClass(MyReducer.class);
> When I run the job, and output the keys & values from the mapper to
> System.out, it doesn't seem like the key & value are getting populated
> correctly - the key is NULL, and the value is a new, empty instance of
> MyObject.
> The files this job is reading were output by another job that used a custom
> InputFormat, and so it didn't have the same problem, and I have validated
> using a SequenceFile.Reader that the data is actually there, and non-null.
> One strange thing i had to do to get the reader to work is this (see *BOLD
> * text - I had to add that in order for the values to show up - I think
> this may have something to do with why SequenceFileInputFormat is having
> trouble as well...)
> String key = new String();
> while (*(key = (String) *r.next(key)) != null) {
>      HeadlineDocument value = new HeadlineDocument();
>      *value = (HeadlineDocument) *r.getCurrentValue(value);
>      System.out.println("Key: " + key.toString());
>      System.out.println("Value: " + value.toString());
> }
> Anyone got any hints as to how one uses JavaSerialization properly in the
> INPUT phase of a MapReduce job?
> Thanks for any help
> -jg-

Reply via email to