Sure you can, as we provide pluggable code points via the API. Just write a custom record reader that doubles the work (first round reads actual input, second round reads your known output and reiterates). In the mapper, separate the first and second logic via a flag.
On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.go...@gmail.com> wrote: > Hi, > Is there a way to keep a map-task alive after it has finished its work, to > later perform another task on its same input? > For example, consider the k-means clustering algorithm (k-means > description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop > implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>). > The only thing changing between iterations is the clusters centers. All the > input points remain the same. Keeping the mapper alive, and performing the > next round of map-tasks on the same node will save a lot of communication > cost. > > Thanks, > Yaron > -- Harsh J