Re: SequenceFileInputFormat doesn't return whole records

Harsh J Fri, 19 Aug 2011 07:16:03 -0700

Tim,

Do you also set your I/O formats explicitly to SequenceFileInputFormat
and SequenceFileOutputFormat? Via job.setInputFormat/setOutputFormat I
mean.


Hadoop should not be splitting records across maps/mappers. There are
specific test cases that ensure this does not happen, so it would seem
strange if it does this.

On Fri, Aug 19, 2011 at 6:01 PM, Tim Fletcher <zigomu...@gmail.com> wrote:
> Hi all,
> I am having issues using SequenceFileInputFormat to retrieve whole records
> I have 1 job that is used to write to a SequenceFile
> SequenceFileOutputFormat.setOutputPath(job, new Path("out/data"));
> SequenceFileOutputFormat.setOutputCompressionType(job,
> SequenceFile.CompressionType.NONE);
> I then have a second job that is ment to read the file for processing
> SequenceFileInputFormat.addInputPath(job, new Path("out/data"));
> However, the values that i get as the arguments to the Map part of my job
> only seems to contain parts of the record. I am sure that i am missing
> something rather fundamental as to how Hadoop splits inputs to the Mapper,
> but can't seem to find a way to stop the records being split.
> Any help (or a pointer to a specific page in the doc) would be greatly
> appreciated
> Regards,
> Tim



-- 
Harsh J

Re: SequenceFileInputFormat doesn't return whole records

Reply via email to