IMHO, it will be better if you separate your mapper and reducer into
different jobs.

Regards,
BaiGang

On Tue, May 3, 2011 at 2:09 PM, Stanley Xu <wenhao...@gmail.com> wrote:

> Dear all,
>
> We have a task to run a map-reduce job multiple times to do some machine
> learning calculation. We will first use a mapper to update the data
> iteratively, and then use the reducer to process the output of the mapper to
> update a global matrix. After that, we need to re-use the output of the
> previous mapper(as a datasource) and reducer(as a set of parameters) to
> re-run the map-reduce again to do another round of learning.
>
> I am wondering is there any setting or API I could use to let the hadoop to
> keep both the output of the mapper and reducer? Now it looks if it is a job
> contains a reducer, it will delete the intermediate result generated by the
> mapper.
>
> Thanks.
> Stanley Xu
>
>

Reply via email to