Tim, Do you also set your I/O formats explicitly to SequenceFileInputFormat and SequenceFileOutputFormat? Via job.setInputFormat/setOutputFormat I mean.
Hadoop should not be splitting records across maps/mappers. There are specific test cases that ensure this does not happen, so it would seem strange if it does this. On Fri, Aug 19, 2011 at 6:01 PM, Tim Fletcher <zigomu...@gmail.com> wrote: > Hi all, > I am having issues using SequenceFileInputFormat to retrieve whole records > I have 1 job that is used to write to a SequenceFile > SequenceFileOutputFormat.setOutputPath(job, new Path("out/data")); > SequenceFileOutputFormat.setOutputCompressionType(job, > SequenceFile.CompressionType.NONE); > I then have a second job that is ment to read the file for processing > SequenceFileInputFormat.addInputPath(job, new Path("out/data")); > However, the values that i get as the arguments to the Map part of my job > only seems to contain parts of the record. I am sure that i am missing > something rather fundamental as to how Hadoop splits inputs to the Mapper, > but can't seem to find a way to stop the records being split. > Any help (or a pointer to a specific page in the doc) would be greatly > appreciated > Regards, > Tim -- Harsh J