Hi, As far as I understand, let mapper produce top N records is not working as each mapper only has partial knowledge of the data, which will not lead to global optimal... I think your mapper needs to output all records (combined) and let the reducer to pick the top N values.
-Rui ----- Original Message ---- From: Vadim Zaliva <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Tuesday, January 15, 2008 4:13:11 PM Subject: Re: single output file On Jan 15, 2008, at 13:57, Ted Dunning wrote: > This is happening because you have many reducers running, only one > of which > gets any data. > > Since you have combiners, this probably isn't a problem. That reducer > should only get as many records as you have maps. It would be a > problem if > your reducer were getting lots of input records. > > You can avoid this by setting the number of reducers to 1. Thanks! I also have another, perhaps stupid question. I am trying to write a task which will produce a list of records with top N values. My idea is to write a reducer class which iterates through records keeping N with biggest values and spits them out. I can use it as both a combiner and reducer class. This way each MAP task will produce N records and I will set up single reduce task which will combine them into final N records. (N is reasonably small, like 10). However to do this I need to postpone issuing output until I am done processing all records. I can try to do this in close() method, but I do not have an OutputCollector there. I guess I can write special output collector, but it seems a bit artificial. Probably I am missing something obvious and there is a common and easy way to do this? Thanks! Sincerely, Vadim ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs