Hello Henry, Per the older conversation, what Owen was pointing to were the new API Mapper/Reducer classes, and its run(…) method override specifically: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/Reducer.html#run(org.apache.hadoop.mapreduce.Reducer.Context)
You'll need to port your job to the new (still a bit unstable) API to leverage this. Here are some slides to aid you in that task: http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (The first part, from Owen). On Fri, Mar 9, 2012 at 4:32 AM, Henry Helgen <hhel...@gmail.com> wrote: > I am using hadoop 0.20.2 mapreduce API. The program is running fine, just > slower than it could. > > I sum values and then use > job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort > descending by sum. I need to stop the reducer after outputting the first N > records. This would save the reducer from running over thousands of records > when it only needs the first few records. Is there a solution with the new > mapreduce 0.20.2 API? > > ------------------------------------------------------------------- > I notice messages from 2008 about this topic: > > http://grokbase.com/t/hadoop/common-user/089420wvkx/stop-mr-jobs-after-n-records-have-been-produced > > https://issues.apache.org/jira/browse/HADOOP-3973 > > The last statement follows, but the link is broken. > "You could do this pretty easily by implementing a custom MapRunnable. > There is no equivalent for reduces. The interface proposed in > HADOOP-1230 would support that kind of application. See: > http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/ > hadoop/mapreduce/ > Look at the new Mapper and Reducer interfaces." > > -- Harsh J