First of all, thanks to whoever maintains the hadoop-ec2 scripts. They've saved us untold time and frustration getting started with a small testing cluster (5 instances).
A question: when we log into the newly created cluster, and run jobs from the example jar (pi, etc) everything works great. We expect our custom jobs will run just as smoothly. However, when we restart the namenodes and tasktrackers by running bin/stop-all.sh on the master, it tries to stop only activity on localhost. Running start-all.sh then boots up a localhost-only cluster (on which jobs run just fine). The only way we've been able to recover from this situation is to use bin/terminate-hadoop-cluster and bin/destroy-hadoop-cluster and then start again from scratch with a new cluster. There must be a simple way to restart the namenodes and jobtrackers across all machines from the master. Also, I think understanding the answer to this question might put a lot more into perspective for me, so I can go on to do more advanced things on my own. Thanks for any assistance / insight! Chris output from stop-all.sh == stopping jobtracker localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts. localhost: no tasktracker to stop stopping namenode localhost: no datanode to stop localhost: no secondarynamenode to stop conf files in /usr/local/hadoop-0.17.0 == # cat conf/slaves localhost # cat conf/masters localhost -- Chris Anderson http://jchris.mfdz.com