Re: hadoop: how to find top N frequently occurring words

Tarandeep Singh Mon, 04 Feb 2008 14:20:46 -0800

On Feb 4, 2008 2:11 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> This is exactly the same as word counting, except that you have a second
> pass to find the top n per block of data (this can be done in a mapper) and
> then a reducer can quite easily merge the results together.
>


This would mean I have to write a second program that reads the output
of first and does the job. I was wondering if it could be done in one
program.

> This wouldn't be homework, would it?
>
no, it isn't homework. I read the word count program that came along
with hadoop, wanted to extend it to solve my problem.

thanks,
Taran

> MIles
>
>
> On 04/02/2008, Tarandeep Singh <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > Can someone guide me on how to write program using hadoop framework
> > that analyze the log files and find out the top most frequently
> > occurring keywords. The log file has the format -
> >
> > keyword source dateId
> >
> > Thanks,
> > Tarandeep
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>

Re: hadoop: how to find top N frequently occurring words

Reply via email to