On Feb 4, 2008 2:28 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> If by "one program" you mean a single map/reduce, then if you don't have
> much data, you could easily have the mapper store all the input and then
> compute the top-n.

yes I meant, single map/reduce. I do have huge data, so I think I will
go with the second option that you suggested. But still, I would like
to know how to store data in mapper. Could you tell me the API for
that ?

>
> If however you have a lot of data, then the more interesting alternative is
> to use a randomised data-structure (for example a Bloomier Filter) and count
> directly in that.  This would lead to some quantifiable error rate, which
> may be acceptable for your application.
>
Thanks for suggesting this. I didn't know about it. I will read more
about it and hopefully it will solve my problem.

thanks,
Taran

> Miles
>
>
> On 04/02/2008, Tarandeep Singh <[EMAIL PROTECTED]> wrote:
> >
> > On Feb 4, 2008 2:11 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> > > This is exactly the same as word counting, except that you have a second
> > > pass to find the top n per block of data (this can be done in a mapper)
> > and
> > > then a reducer can quite easily merge the results together.
> > >
> >
> > This would mean I have to write a second program that reads the output
> > of first and does the job. I was wondering if it could be done in one
> > program.
> >
> > > This wouldn't be homework, would it?
> > >
> > no, it isn't homework. I read the word count program that came along
> > with hadoop, wanted to extend it to solve my problem.
> >
> > thanks,
> > Taran
> >
> > > MIles
> > >
> > >
> > > On 04/02/2008, Tarandeep Singh <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Can someone guide me on how to write program using hadoop framework
> > > > that analyze the log files and find out the top most frequently
> > > > occurring keywords. The log file has the format -
> > > >
> > > > keyword source dateId
> > > >
> > > > Thanks,
> > > > Tarandeep
> > > >
> > >
> > >
> > >
> > > --
> > > The University of Edinburgh is a charitable body, registered in
> > Scotland,
> > > with registration number SC005336.
> > >
> >
>
>
>
> --
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>

Reply via email to