My second question is about the ec2 machines has anyone solved the hostname
problem in a automated way?
Example if I launch a ec2 server to run a task tracker the hostname reported
back to my local cluster with its internal address
the local reduce task can not access the map files on the ec2 machine
because with the default hostname.
I get a error:
WARN org.apache.hadoop.mapred.ReduceTask: java.net.UnknownHostException:
domU-12-31-39-00-A4-05.compute-1.internal
<question>
Is there a automated way to start a tasktracker on a ec2 machine with it
useing the public hostname so the local task can get the maps from the ec2
machines?
example something like
bin/hadoop-daemon.sh start tasktracker
host=ec2-xx-xx-xx-xx.z-2.compute-1.amazonaws.com
That I can run to start just the tasktracker with the correct hostname
</question>
What I am trying to do is build a custom ami image that I can just launch
when need to add extra cpu power to my cluster and to automatically start
the tasktracker vi a shell script that can be ran at startup.
Billy
"Billy Pearson" <[EMAIL PROTECTED]>
wrote in message news:[EMAIL PROTECTED]
I have a question someone may have answered here before but I can not find
the answer.
Assuming I have a cluster of servers hosting a large amount of data
I want to run a large job that the maps take a lot of cpu power to run and
the reduces only take a small amount cpu to run.
I want to run the maps on a group of EC2 servers and run the reduces on
the local cluster of 10 machines.
The problem I am seeing is the map outputs, if I run the maps on EC2 they
are stored local on the instance
What I am looking to do is have the map output files stored in hdfs so I
can kill the EC2 instances sense I do not need them for the reduces.
The only way I can thank to do this is run two jobs one maper and store
the output on hdfs and then run a second job to run the reduces
from the map outputs store on the hfds.
Is there away to make the mappers store the final output in hdfs?