Re: Hadoop Runner

madhu phatak Tue, 21 Jun 2011 03:50:54 -0700

Define Ur own custom Record Reader and its efficient .

On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <ha...@cloudera.com> wrote:


> Mark,
>
> I may not have gotten your question exactly, but you can do further
> processing inside of your FileInputFormat derivative's RecordReader
> implementation (just before it loads the value for a next() form of
> call -- which the MapRunner would use to read).
>
> If you're looking to dig into Hadoop's source code to understand the
> flow yourself, MapTask.java is what you may be looking for (run*
> methods).
>
> On Sun, Jun 12, 2011 at 3:25 AM, Mark question <markq2...@gmail.com>
> wrote:
> > Hi,
> >
> >  1) Where can I find the "main" class of hadoop? The one that calls the
> > InputFormat then the MapperRunner and ReducerRunner and others?
> >
> >    This will help me understand what is in memory or still on disk ,
> exact
> > flow of data between split and mappers .
> >
> > My problem is, assuming I have a TextInputFormat and would like to modify
> > the input in memory before being read by RecordReader... where shall I do
> > that?
> >
> >    InputFormat was my first guess, but unfortunately, it only defines the
> > logical splits ... So, the only way I can think of is use the
> recordReader
> > to read all the records in split into another variable (with the format I
> > want) then process that variable by map functions.
> >
> >   But is that efficient? So, to understand this,I hope someone can give
> an
> > answer to Q(1)
> >
> > Thank you,
> > Mark
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop Runner

Reply via email to