well, to answer your last question first, just set the # reducers to
zero.
but you can't just run reducers without mappers (as far as I know,
having never tried). so your local job will need to run identity
mappers in order to feed your reducers.
http://hadoop.apache.org/core/docs/r0.16.4/api/org/apache/hadoop/mapred/lib/IdentityMapper.html
ckw
On Jun 14, 2008, at 1:31 PM, Billy Pearson wrote:
I have a question someone may have answered here before but I can
not find the answer.
Assuming I have a cluster of servers hosting a large amount of data
I want to run a large job that the maps take a lot of cpu power to
run and the reduces only take a small amount cpu to run.
I want to run the maps on a group of EC2 servers and run the reduces
on the local cluster of 10 machines.
The problem I am seeing is the map outputs, if I run the maps on EC2
they are stored local on the instance
What I am looking to do is have the map output files stored in hdfs
so I can kill the EC2 instances sense I do not need them for the
reduces.
The only way I can thank to do this is run two jobs one maper and
store the output on hdfs and then run a second job to run the reduces
from the map outputs store on the hfds.
Is there away to make the mappers store the final output in hdfs?
--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/