Devaraj,

Thanks for the pointer.

I ended up extending FileInputFormat.

I made some notes about the program I wrote to use the custom
FileInputFormat here:

https://cakephp.rootser.com/posts/view/64

I think it may be because I'm using 1.0.1, but I did not need to write a
getSplits() method.  However, I did need to write an IsSplittable(), where
I just went with the default implementation.  Is the way that one makes
one's input splittable to assign the job configuration a codec?

Also, I think that if I now take my FileInputFormat object
(RootserFileInputFormat in the page I link to above) and change the
nextKeyValue() method to use and ObjectInputStream, and modify
RootserFileInputFormat to have a type parameter, and make the type of the
object nextKeyValue() reads out of the split the same type as the parameter
of RootserFileInputFormat, I will have a FileInputFormat object that can
read any kind of (serializable) object out of a split.  While this is cool,
I can't believe I am the first person who thought of something like this.
Do you know if there is already a way to do this using the Hadoop framework?

Thanks for the pointer on how to get started.

On Thu, May 17, 2012 at 6:32 AM, Devaraj k <devara...@huawei.com> wrote:

> Hi John,
>
>
> You can extend  FileInputFormat(or implement InputFormat) and then you
> need to implement below methods.
>
> 1. InputSplit[] getSplits(JobConf job, int numSplits)  : For splitting the
> input files logically for the job. If FileInputFormat.getSplits(JobConf
> job, int numSplits) suits for your requirement, you can make use of it.
> Otherwise you can implement it based on your need.
>
> 2. RecordReader<K,V> RecordReader(InputSplit split, JobConf job, Reporter
> reporter) : For reading the input split.
>
>
> Thanks
> Devaraj
>
> ________________________________________
> From: John Hancock [jhancock1...@gmail.com]
> Sent: Thursday, May 17, 2012 3:40 PM
> To: common-user@hadoop.apache.org
> Subject: custom FileInputFormat class
>
> All,
>
> Can anyone on the list point me in the right direction as to how to write
> my own FileInputFormat class?
>
> Perhaps this is not even the way I should go, but my goal is to write a
> MapReduce job that gets its input from a binary file of integers and longs.
>
> -John
>

Reply via email to