Devaraj, Thanks for the pointer.
I ended up extending FileInputFormat. I made some notes about the program I wrote to use the custom FileInputFormat here: https://cakephp.rootser.com/posts/view/64 I think it may be because I'm using 1.0.1, but I did not need to write a getSplits() method. However, I did need to write an IsSplittable(), where I just went with the default implementation. Is the way that one makes one's input splittable to assign the job configuration a codec? Also, I think that if I now take my FileInputFormat object (RootserFileInputFormat in the page I link to above) and change the nextKeyValue() method to use and ObjectInputStream, and modify RootserFileInputFormat to have a type parameter, and make the type of the object nextKeyValue() reads out of the split the same type as the parameter of RootserFileInputFormat, I will have a FileInputFormat object that can read any kind of (serializable) object out of a split. While this is cool, I can't believe I am the first person who thought of something like this. Do you know if there is already a way to do this using the Hadoop framework? Thanks for the pointer on how to get started. On Thu, May 17, 2012 at 6:32 AM, Devaraj k <devara...@huawei.com> wrote: > Hi John, > > > You can extend FileInputFormat(or implement InputFormat) and then you > need to implement below methods. > > 1. InputSplit[] getSplits(JobConf job, int numSplits) : For splitting the > input files logically for the job. If FileInputFormat.getSplits(JobConf > job, int numSplits) suits for your requirement, you can make use of it. > Otherwise you can implement it based on your need. > > 2. RecordReader<K,V> RecordReader(InputSplit split, JobConf job, Reporter > reporter) : For reading the input split. > > > Thanks > Devaraj > > ________________________________________ > From: John Hancock [jhancock1...@gmail.com] > Sent: Thursday, May 17, 2012 3:40 PM > To: common-user@hadoop.apache.org > Subject: custom FileInputFormat class > > All, > > Can anyone on the list point me in the right direction as to how to write > my own FileInputFormat class? > > Perhaps this is not even the way I should go, but my goal is to write a > MapReduce job that gets its input from a binary file of integers and longs. > > -John >