On May 7, 2008, at 6:30 AM, Roberto Zandonati wrote:

Hi at all, I'm a newbie and I have the following problem.

I need to implement an InputFormat such that the isSplitable always
returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question
no 10).
And here there is the problem.

I have also to implement the RecordReader interface for returning the
whole content of the input file but I don't know how. I have found
only examples that uses the LineRecordReader


Couple of things.

1. Take a look at SequenceFileRecordReader: http://svn.apache.org/ viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/ SequenceFileRecordReader.java?view=log

2. If you just want to process a text file as a while or a sequence file as whole (or any existing one) you do not need to implement a 'RecordReader' at all. Just sub-class the InputFormat, override the isSplittable and the RecordReader will work correctly. Take a look at SortValidtor (http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/ org/apache/hadoop/mapred/SortValidator.java) and how it sub-classes SequenceFileInputFormat to implement a NonSplittableSequenceFileInputFormat.

Arun

Reply via email to