Define Ur own custom Record Reader and its efficient . On Sun, Jun 12, 2011 at 10:12 AM, Harsh J <ha...@cloudera.com> wrote:
> Mark, > > I may not have gotten your question exactly, but you can do further > processing inside of your FileInputFormat derivative's RecordReader > implementation (just before it loads the value for a next() form of > call -- which the MapRunner would use to read). > > If you're looking to dig into Hadoop's source code to understand the > flow yourself, MapTask.java is what you may be looking for (run* > methods). > > On Sun, Jun 12, 2011 at 3:25 AM, Mark question <markq2...@gmail.com> > wrote: > > Hi, > > > > 1) Where can I find the "main" class of hadoop? The one that calls the > > InputFormat then the MapperRunner and ReducerRunner and others? > > > > This will help me understand what is in memory or still on disk , > exact > > flow of data between split and mappers . > > > > My problem is, assuming I have a TextInputFormat and would like to modify > > the input in memory before being read by RecordReader... where shall I do > > that? > > > > InputFormat was my first guess, but unfortunately, it only defines the > > logical splits ... So, the only way I can think of is use the > recordReader > > to read all the records in split into another variable (with the format I > > want) then process that variable by map functions. > > > > But is that efficient? So, to understand this,I hope someone can give > an > > answer to Q(1) > > > > Thank you, > > Mark > > > > > > -- > Harsh J >