On Feb 4, 2008 2:28 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > If by "one program" you mean a single map/reduce, then if you don't have > much data, you could easily have the mapper store all the input and then > compute the top-n.
yes I meant, single map/reduce. I do have huge data, so I think I will go with the second option that you suggested. But still, I would like to know how to store data in mapper. Could you tell me the API for that ? > > If however you have a lot of data, then the more interesting alternative is > to use a randomised data-structure (for example a Bloomier Filter) and count > directly in that. This would lead to some quantifiable error rate, which > may be acceptable for your application. > Thanks for suggesting this. I didn't know about it. I will read more about it and hopefully it will solve my problem. thanks, Taran > Miles > > > On 04/02/2008, Tarandeep Singh <[EMAIL PROTECTED]> wrote: > > > > On Feb 4, 2008 2:11 PM, Miles Osborne <[EMAIL PROTECTED]> wrote: > > > This is exactly the same as word counting, except that you have a second > > > pass to find the top n per block of data (this can be done in a mapper) > > and > > > then a reducer can quite easily merge the results together. > > > > > > > This would mean I have to write a second program that reads the output > > of first and does the job. I was wondering if it could be done in one > > program. > > > > > This wouldn't be homework, would it? > > > > > no, it isn't homework. I read the word count program that came along > > with hadoop, wanted to extend it to solve my problem. > > > > thanks, > > Taran > > > > > MIles > > > > > > > > > On 04/02/2008, Tarandeep Singh <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi, > > > > > > > > Can someone guide me on how to write program using hadoop framework > > > > that analyze the log files and find out the top most frequently > > > > occurring keywords. The log file has the format - > > > > > > > > keyword source dateId > > > > > > > > Thanks, > > > > Tarandeep > > > > > > > > > > > > > > > > -- > > > The University of Edinburgh is a charitable body, registered in > > Scotland, > > > with registration number SC005336. > > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. >