On Sun, Jan 10, 2010 at 6:08 PM, Grant Ingersoll <[email protected]>wrote:

> Can you share the script, obviously removing the part for your prop.
> software?


Sure.  Apologies for *really* ugly code.  Below is the launch script, with
some boring and *secret* bits expunged.  The only real downfall of this sort
of approach is that the startup script needs to have stuff injected into it
for different kinds of servers.  Doing it over, I would make a completely
static boot script that looks to zookeeper to find out what tasks need
doing.  Basically, by making the client-boot scripts dynamic, I was trying
to inject configuration management via an inappropriate mechanism.  Since I
had ZK running in the cloud already, I should have just used a real
configuration management system instead of gross scripting.

You need to make sure you have a command line program on the client to
receive any secret keys because user-data is not considered secure.
We do this with something like this:

  # Send in secrets via stdin instead of command line to avoid snoopers.
  echo
  echo "Running remote script..."
  echo "Process $$ : $CLOUD_SERVER:$ZK_PORT $VOLUME $ZK_INTERNAL
$INT_HOST_NAME $VOLUME_ZONE"

  ssh -n -o StrictHostKeyChecking=false -i $ADMIN_KEY_RSA r...@$i
"echo $AWS_API_KEY $AWS_API_SECRET | /home/client/$script $ENV $REV
"$CLOUD_SERVER":"$ZK_PORT" $VOLUME $ZK_INTERNAL $INT_HOST_NAME
$VOLUME_ZONE"


Here is a less than complete excerpt of the launch script.  It should have
most of the bits you need.  It won't run as it stands because a fair bit of
stuff has been expunged.

#!/bin/sh

#########

# ASSUMPTIONS:

# ---- keys are available and referenced
# cert and secret key are available and correct perms
# client-boot.sh has been construct to download and install all
necessary software
# file named cloud-key in the current directory contains the key

# obtained using
#
# ec2-add-keypair cloud-admin-key

# ---- environment variables
# EC2_PRIVATE_KEY=~/.ec2/pk-xx.pem
# EC2_CERT=~/.ec2/cert-xx.pem
# EC2_HOME points to EC2 distro directory
# AWS_API_KEY=xx

# AWS_API_SECRET=xx/+yy/zz
# path includes $EC2_HOME/bin


#########
# DEFINITIONS:
ami=ami-1c5db975

. ./.cloud_client_env_settings

if [ $# -eq 5 -a "$5" = "-large" ]

then
    ami="ami-b1fe19d8 -t m1.large"
fi

#########
# This script will accept two arguments:
# Uasge: client_cloud_launch.sh 3 namenode.sh
# the first parameter is the instance number want to launch

# the second parameter is the script want to start on instance
#
# it will create a node start script and then launch a bunch of instances

#START

echo $ADMIN_KEY
echo $ADMIN_KEY_RSA
echo $ZK_ADMIN_KEY

echo $ENV
echo $REV

#please pay attention to this key-pair, used for creating instances,
you should have the corret $ADMIN_KEY_RSA go with this key pair
KEY_PAIR=cloud-admin-key


ZK_GROUP_NAME=zk_cluster

CLIENT_GROUP_NAME=zk_client
cluster_size=$1
ZK_PORT=4099

VOLUME_ZONE=us-east-1a
VOLUME=...
TIMEOUT=600

start=$(date +%s)
echo started at $(date)

# try to do ec2-describe-instances to get information on what ZK
servers are available for use

# assumptions are that ZK_GROUP_NAME is the group with which one ZK
cluster will be started, otherwise we would not know
# which one is which

ec2-describe-instances  > zk_instances.tmp.$$

... really silly code to do what should just be grep deleted here.
all it does is hack zk_instances.tmp.$$ into better form in
zk_instances.$$ ...

if [ "$ALREADYREAD" = 1 ]
then
 sed -n "$NEXTLINE","$lineno"p zk_instances.tmp.$$ >> zk_instances.$$
fi

ZK_EXTERNAL=$(grep INSTANCE zk_instances.$$ | grep running | grep
$ZK_ADMIN_KEY  | cut -f4 | tr '\n' '~' | sed -e 's/~/:2181,/g' -e
's/,$//')

ZK_INTERNAL=$(grep INSTANCE zk_instances.$$ | grep running | grep
$ZK_ADMIN_KEY  | cut -f5 | tr '\n' '~' | sed -e 's/~/:2181,/g' -e
's/,$//')

CLOUD_SERVER=$(grep INSTANCE zk_instances.$$ | grep running | grep
$ZK_ADMIN_KEY  | cut -f5 | head -1)

echo $CLOUD_SERVER

# launch client nodes.  This also causes client-boot.sh to be run on
each node.  Somebody else should have built client-boot.sh for us
echo starting $cluster_size instances now...
ins_start_time=$(date +%s)

ec2-run-instances $ami -g $CLIENT_GROUP_NAME -k $ADMIN_KEY -f
client-boot.sh -z $VOLUME_ZONE  -n $cluster_size >
client_instances.tmp.$$

cat client_instances.tmp.$$ | grep INSTANCE | cut -f2 > client_instances.$$

T1=0
# this factor is 90% of total cluster we want to start,
# once the number of running instance reaches this factor,
# we will continue our job, killing rather than waiting for the last 10%
factor=`awk -v x=$cluster_size BEGIN'{printf "%d\n",x*0.9+0.5 }'`

while [ "$T1" != $cluster_size ]
do
        rm -f client_instances.tmp.$$
        ec2-describe-instances | grep INSTANCE | grep running > 
current_running.$$
        all_instance=`cat client_instances.$$`

        T1=0

        for inst in $all_instance; do
                if [ -z "$inst" ]; then
                        continue;
                fi
                
                ok=`cat current_running.$$ | grep $inst`
                if [ -z "$ok" ]; then
                        echo Wait a moment, $inst is not ready yet.

                else
                        T1=`expr $T1 + 1`
                        echo $ok >> client_instances.tmp.$$
                fi
        done    

        # check timeout or not
        ins_curr_time=$(date +%s)
        elapse=`expr $ins_curr_time - $ins_start_time`

        if [ $elapse -gt $TIMEOUT ]; then

        # if we have had 90% instances started, we can stop waiting
and continue the following process
        if [ ! $T1 -lt $factor ]; then
            echo We have had $T1 instances started, kill the unstarted ones...

            #should KILL the unstarted instances here
            for everyinst in $all_instance; do
                isrunning=`cat current_running.$$ | grep $everyinst`
                if [ -z "$isrunning" ]; then

                    ec2-terminate-instances $everyinst
                fi
            done
            break
        fi

                echo We have waited for $elapse seconds, but only $T1 started, 
will
not wait any more. Program will exit now!

                #before exit we need to stop all instances we planed to start
                for inst in $all_instance; do
                        ec2-terminate-instances $inst
                done
                exit 1
        fi

        echo "Waiting for most instances to be running... $T1/$cluster_size so 
far."

done

rm -f current_running.$$

ins_curr_time=$(date +%s)
elapse=`expr $ins_curr_time - $ins_start_time`
echo Congratulations! We have $T1 of $cluster_size running instances
in $elapse seconds!


chmod 600 $ADMIN_KEY_RSA

chmod 700 $script



# I need to record the instances public name and ZK_EXTERNAL in a file
for uploading fasta files
echo $ZK_EXTERNAL > .externalzk


finished=$(date +%s)

echo  completed after $(expr $finished - $start) seconds

Reply via email to