Devaraj,
Thanks for the pointer.
I ended up extending FileInputFormat.
I made some notes about the program I wrote to use the custom
FileInputFormat here:
https://cakephp.rootser.com/posts/view/64
I think it may be because I'm using 1.0.1, but I did not need to write a
getSplits() method. However, I did need to write an IsSplittable(), where
I just went with the default implementation. Is the way that one makes
one's input splittable to assign the job configuration a codec?
Also, I think that if I now take my FileInputFormat object
(RootserFileInputFormat in the page I link to above) and change the
nextKeyValue() method to use and ObjectInputStream, and modify
RootserFileInputFormat to have a type parameter, and make the type of the
object nextKeyValue() reads out of the split the same type as the parameter
of RootserFileInputFormat, I will have a FileInputFormat object that can
read any kind of (serializable) object out of a split. While this is cool,
I can't believe I am the first person who thought of something like this.
Do you know if there is already a way to do this using the Hadoop framework?
Thanks for the pointer on how to get started.
On Thu, May 17, 2012 at 6:32 AM, Devaraj k devara...@huawei.com wrote:
Hi John,
You can extend FileInputFormat(or implement InputFormat) and then you
need to implement below methods.
1. InputSplit[] getSplits(JobConf job, int numSplits) : For splitting the
input files logically for the job. If FileInputFormat.getSplits(JobConf
job, int numSplits) suits for your requirement, you can make use of it.
Otherwise you can implement it based on your need.
2. RecordReaderK,V RecordReader(InputSplit split, JobConf job, Reporter
reporter) : For reading the input split.
Thanks
Devaraj
From: John Hancock [jhancock1...@gmail.com]
Sent: Thursday, May 17, 2012 3:40 PM
To: common-user@hadoop.apache.org
Subject: custom FileInputFormat class
All,
Can anyone on the list point me in the right direction as to how to write
my own FileInputFormat class?
Perhaps this is not even the way I should go, but my goal is to write a
MapReduce job that gets its input from a binary file of integers and longs.
-John