thanks!
I'll try overriding the run method first.

On Wed, Oct 12, 2011 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:

> Yaron,
>
> That would certainly seem to be the easy way out, with the only
> negative side being that you'd have to cache your values in memory.
>
> If you plug deeper down into the RecordReader levels (which provide
> the specific nextKV(…) methods), you can perhaps keep just a list of
> offsets of all successful line matches and re-read the whole split in
> the second run. This would cost you slightly higher I/O as you seek
> through once again, but the benefit would be lower memory consumption
> -- if that can be a concern here.
>
> [Or go the longer way, and use the Reducer phase!]
>
> On Wed, Oct 12, 2011 at 5:14 PM, Yaron Gonen <yaron.go...@gmail.com>
> wrote:
> > Thanks for the fast reply!
> > I've dug in the code a little bit, and it seems to me that I can achieve
> my
> > goal by overloading Mapper.run method: just iterate over the whole split
> by
> > using context.nextKeyValue() and then call map only with the values I
> need.
> > Since I'm a novice Hadooper, am I thinking it the wrong way?
> >
> > thanks again,
> > yaron
> >
> > On Wed, Oct 12, 2011 at 12:44 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hello Yaron,
> >>
> >> Yes, this is possible to do.
> >>
> >> You need to plug in your own RecordReader implementation into the job,
> >> to control the emits and the action done before feeding key-value pair
> >> data into map(…).
> >>
> >> On Wed, Oct 12, 2011 at 2:42 PM, Yaron Gonen <yaron.go...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > The map method in the Mapper gets as a parameter a single line from
> the
> >> > split. Is there a way for Mappers to get the whole split as input?
> >> > I'd like to scan the whole split before I decide which key-value pairs
> >> > to
> >> > emit to the reducer.
> >> > Thanks
> >> > yaron
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Reply via email to