Re: Auto-shutdown for EC2 clusters

Steve Loughran Fri, 24 Oct 2008 15:02:43 -0700

Paco NATHAN wrote:

Hi Karl,


Rather than using separate key pairs, you can use EC2 security groups
to keep track of different clusters.

Effectively, that requires a new security group for every cluster --
so just allocate a bunch of different ones in a config file, then have
the launch scripts draw from those. We also use EC2 static IP
addresses and then have a DNS entry named similarly to each security
group, associated with a static IP once that cluster is launched.
It's relatively simple to query the running instances and collect them
according to security groups.

One way to handle detecting failures is just to attempt SSH in a loop.
Our rough estimate is that approximately 2% of the attempted EC2 nodes
fail at launch. So we allocate more than enough, given that rate.

We have a patch to add a ping() operation to all the services -namenode,datanode, task tracker, job tracker. With a proposal to make thatremotely visible: https://issues.apache.org/jira/browse/HADOOP-3969 ,you could hit every URL with an appropriately authenticated GET and seeif it is live.

Another trick I like for long haul health checks is to use google talkand give every machine a login underneath (you can have them all in yourown domain via google apps). Then you can use XMPP to monitor systemhealth. A more advanced variant is to use it as a the command interface,which is very similar to how botnets work with IRC, though in that casethe botnet herders are trying to manage 500,000 0wned boxes and don'twant a reply.

Re: Auto-shutdown for EC2 clusters

Reply via email to