*HeadlineDocument *in the code below is equivalent to *MyObject* - I forgot to obfuscate that one... opps...
On Tue, Sep 16, 2008 at 11:46 AM, Jason Grey <[EMAIL PROTECTED]>wrote: > I'm trying to use JavaSerialization for a series of MapReduce jobs, and > when it comes to reading a SequenceFile using SequenceFileInputFormat with > JavaSerialized objects, something breaks down. > > I've added "org.apache.hadoop.io.serializer.JavaSerialization" to the > io.serializations property in my config, and using native java types in my > mapper and reducer implementations, like so: > > MyMapper implements Mapper<String,MyObject,String,MyObject> > MyReducer implements Reducer<String,MyObject,String,MyObject> > > in my job configuration, i"m doing this: > > conf.setInputFormat(SequenceFileInputFormat.class); > FileInputFormat.setInputPaths(conf, path1, path2); > conf.setOutputFormat(SequenceFileOutputFormat.class); > FileOutputFormat.setOutputPath(conf, path3); > conf.setOutputKeyClass(String.class); > conf.setOutputKeyComparatorClass(JavaSerializationComparator.class); > conf.setOutputValueClass(MyObject.class); > conf.setMapperClass(MyMapper.class); > conf.setReducerClass(MyReducer.class); > > When I run the job, and output the keys & values from the mapper to > System.out, it doesn't seem like the key & value are getting populated > correctly - the key is NULL, and the value is a new, empty instance of > MyObject. > > The files this job is reading were output by another job that used a custom > InputFormat, and so it didn't have the same problem, and I have validated > using a SequenceFile.Reader that the data is actually there, and non-null. > One strange thing i had to do to get the reader to work is this (see *BOLD > * text - I had to add that in order for the values to show up - I think > this may have something to do with why SequenceFileInputFormat is having > trouble as well...) > > String key = new String(); > while (*(key = (String) *r.next(key)) != null) { > HeadlineDocument value = new HeadlineDocument(); > *value = (HeadlineDocument) *r.getCurrentValue(value); > System.out.println("Key: " + key.toString()); > System.out.println("Value: " + value.toString()); > } > > Anyone got any hints as to how one uses JavaSerialization properly in the > INPUT phase of a MapReduce job? > > Thanks for any help > > -jg- >