I read HADOOP-3413 <https://issues.apache.org/jira/browse/HADOOP-3413> a bit more closely - it updates SequenceFile.Reader, not SequenceFileInputFormat, which is what M.R. framework uses... looks like you have to write your own input format, or have your mappers/reducers take raw bytes, and deserialize within...
On Wed, Sep 17, 2008 at 9:04 AM, Jason Grey <[EMAIL PROTECTED]>wrote: > I just found this one this morning, looks like a fix should be in 0.18.0 > according to the bug tracker: > > https://issues.apache.org/jira/browse/HADOOP-3413 > > I'm going to go double check all my code, as I'm pretty sure I am on 0.18.0 > already > > -jg- > > > > On Tue, Sep 16, 2008 at 9:10 PM, Alex Loddengaard <[EMAIL PROTECTED]>wrote: > >> Unfortunately I don't know of a solution to your problem, but I've been >> experiencing the exact same issues while trying to implement a Protocol >> Buffer serialization. Take a look: >> >> <https://issues.apache.org/jira/browse/HADOOP-3788> >> >> I hope this helps others to diagnose your problem. >> >> Alex >> >> On Wed, Sep 17, 2008 at 12:47 AM, Jason Grey <[EMAIL PROTECTED] >> >wrote: >> >> > *HeadlineDocument *in the code below is equivalent to *MyObject* - I >> forgot >> > to obfuscate that one... opps... >> > >> > On Tue, Sep 16, 2008 at 11:46 AM, Jason Grey <[EMAIL PROTECTED] >> > >wrote: >> > >> > > I'm trying to use JavaSerialization for a series of MapReduce jobs, >> and >> > > when it comes to reading a SequenceFile using SequenceFileInputFormat >> > with >> > > JavaSerialized objects, something breaks down. >> > > >> > > I've added "org.apache.hadoop.io.serializer.JavaSerialization" to the >> > > io.serializations property in my config, and using native java types >> in >> > my >> > > mapper and reducer implementations, like so: >> > > >> > > MyMapper implements Mapper<String,MyObject,String,MyObject> >> > > MyReducer implements Reducer<String,MyObject,String,MyObject> >> > > >> > > in my job configuration, i"m doing this: >> > > >> > > conf.setInputFormat(SequenceFileInputFormat.class); >> > > FileInputFormat.setInputPaths(conf, path1, path2); >> > > conf.setOutputFormat(SequenceFileOutputFormat.class); >> > > FileOutputFormat.setOutputPath(conf, path3); >> > > conf.setOutputKeyClass(String.class); >> > > conf.setOutputKeyComparatorClass(JavaSerializationComparator.class); >> > > conf.setOutputValueClass(MyObject.class); >> > > conf.setMapperClass(MyMapper.class); >> > > conf.setReducerClass(MyReducer.class); >> > > >> > > When I run the job, and output the keys & values from the mapper to >> > > System.out, it doesn't seem like the key & value are getting populated >> > > correctly - the key is NULL, and the value is a new, empty instance of >> > > MyObject. >> > > >> > > The files this job is reading were output by another job that used a >> > custom >> > > InputFormat, and so it didn't have the same problem, and I have >> validated >> > > using a SequenceFile.Reader that the data is actually there, and >> > non-null. >> > > One strange thing i had to do to get the reader to work is this (see >> > *BOLD >> > > * text - I had to add that in order for the values to show up - I >> think >> > > this may have something to do with why SequenceFileInputFormat is >> having >> > > trouble as well...) >> > > >> > > String key = new String(); >> > > while (*(key = (String) *r.next(key)) != null) { >> > > HeadlineDocument value = new HeadlineDocument(); >> > > *value = (HeadlineDocument) *r.getCurrentValue(value); >> > > System.out.println("Key: " + key.toString()); >> > > System.out.println("Value: " + value.toString()); >> > > } >> > > >> > > Anyone got any hints as to how one uses JavaSerialization properly in >> the >> > > INPUT phase of a MapReduce job? >> > > >> > > Thanks for any help >> > > >> > > -jg- >> > > >> > >> > >