Re: question about file input format

Harsh J Wed, 17 Aug 2011 19:36:29 -0700

Zhixuan,

You'll require two things here, as you've deduced correctly:


Under InputFormat
- isSplitable -> False
- getRecordReader -> A simple implementation that reads the whole
file's bytes to an array/your-construct and passes it (as part of
next(), etc.).

For example, here's a simple record reader impl you can return
(untested, but you'll get the idea of reading whole files, and porting
to new API is easy as well): https://gist.github.com/1153161

P.s. Since you are reading whole files into memory, keep an eye out
for memory usage (the above example has a 10 MB limit per file, for
example). You could run out of memory easily if you don't handle the
cases properly.

On Thu, Aug 18, 2011 at 4:28 AM, Zhixuan Zhu <z...@calpont.com> wrote:
> I'm new Hadoop and currently using Hadoop 0.20.2 to try out some simple
> tasks. I'm trying to send each whole file of the input directory to the
> mapper without splitting them line by line. How should I set the input
> format class? I know I could derive a customized FileInputFormat class
> and override the isSplitable function. But I have no idea how to
> implement around the record reader. Any suggestion or a sample code will
> be greatly appreciated.
>
> Thanks in advance,
> Grace
>



-- 
Harsh J

Re: question about file input format

Reply via email to