Re: Ec2 and MR Job question

Chris K Wensel Sat, 14 Jun 2008 16:16:38 -0700

well, to answer your last question first, just set the # reducers tozero.

but you can't just run reducers without mappers (as far as I know,having never tried). so your local job will need to run identitymappers in order to feed your reducers.

http://hadoop.apache.org/core/docs/r0.16.4/api/org/apache/hadoop/mapred/lib/IdentityMapper.html


ckw

On Jun 14, 2008, at 1:31 PM, Billy Pearson wrote:

I have a question someone may have answered here before but I cannot find the answer.
Assuming I have a cluster of servers hosting a large amount of data
I want to run a large job that the maps take a lot of cpu power torun and the reduces only take a small amount cpu to run.I want to run the maps on a group of EC2 servers and run the reduceson the local cluster of 10 machines.
The problem I am seeing is the map outputs, if I run the maps on EC2they are stored local on the instanceWhat I am looking to do is have the map output files stored in hdfsso I can kill the EC2 instances sense I do not need them for thereduces.
The only way I can thank to do this is run two jobs one maper andstore the output on hdfs and then run a second job to run the reduces
from the map outputs store on the hfds.

Is there away to make the mappers store the final output in hdfs?


--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Ec2 and MR Job question

Reply via email to