well, to answer your last question first, just set the # reducers to zero.

but you can't just run reducers without mappers (as far as I know, having never tried). so your local job will need to run identity mappers in order to feed your reducers.
http://hadoop.apache.org/core/docs/r0.16.4/api/org/apache/hadoop/mapred/lib/IdentityMapper.html

ckw

On Jun 14, 2008, at 1:31 PM, Billy Pearson wrote:

I have a question someone may have answered here before but I can not find the answer.

Assuming I have a cluster of servers hosting a large amount of data
I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run the maps on a group of EC2 servers and run the reduces on the local cluster of 10 machines.

The problem I am seeing is the map outputs, if I run the maps on EC2 they are stored local on the instance What I am looking to do is have the map output files stored in hdfs so I can kill the EC2 instances sense I do not need them for the reduces.

The only way I can thank to do this is run two jobs one maper and store the output on hdfs and then run a second job to run the reduces
from the map outputs store on the hfds.

Is there away to make the mappers store the final output in hdfs?


--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/





Reply via email to