Re: contrib EC2 with hadoop 0.17

Chris K Wensel Sat, 07 Jun 2008 17:26:19 -0700

The new scripts do not use the start/stop-all.sh scripts, and thus donot maintain the slaves file. This is so cluster startup is muchfaster and a bit more reliable (keys do not need to be pushed to theslaves). Also we can grow the cluster lazily just by starting slavenodes. That is, they are mostly optimized for booting a large clusterfast, doing work, then shutting down (allowing for huge short livedclusters, vs a smaller/cheaper long lived one).

But it probably would be wise to provide scripts to build/refresh theslaves file, and push keys to slaves, so the cluster can betraditionally maintained, instead of just re-instantiated with newparameters etc.

I wonder if these scripts would make sense in general, instead ofbeing ec2 specific?


ckw

On Jun 7, 2008, at 11:31 AM, Chris Anderson wrote:

First of all, thanks to whoever maintains the hadoop-ec2 scripts.
They've saved us untold time and frustration getting started with a
small testing cluster (5 instances).

A question: when we log into the newly created cluster, and run jobs
from the example jar (pi, etc) everything works great. We expect our
custom jobs will run just as smoothly.

However, when we restart the namenodes and tasktrackers by running
bin/stop-all.sh on the master, it tries to stop only activity on
localhost. Running start-all.sh then boots up a localhost-only cluster
(on which jobs run just fine).

The only way we've been able to recover from this situation is to use
bin/terminate-hadoop-cluster and bin/destroy-hadoop-cluster and then
start again from scratch with a new cluster.

There must be a simple way to restart the namenodes and jobtrackers
across all machines from the master. Also, I think understanding the
answer to this question might put a lot more into perspective for me,
so I can go on to do more advanced things on my own.

Thanks for any assistance / insight!

Chris


output from stop-all.sh
==

stopping jobtracker
localhost: Warning: Permanently added 'localhost' (RSA) to the list of
known hosts.
localhost: no tasktracker to stop
stopping namenode
localhost: no datanode to stop
localhost: no secondarynamenode to stop


conf files in /usr/local/hadoop-0.17.0
==

# cat conf/slaves
localhost
# cat conf/masters
localhost




--
Chris Anderson
http://jchris.mfdz.com


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: contrib EC2 with hadoop 0.17

Reply via email to