Hi Yasumasa-san,

On Fri, Sep 10, 2010 at 03:17:56PM +0900, Yasumasa OZAKI wrote:
> Hi,
> 
> I made the STONITH plug-in that can be used by both Xen and KVM.

Thanks! Comments below.

> I would like to hear any opinion.
> 
> Best regards,
> Yasumasa OZAKI
> 
> 
> 

> #!/bin/sh
> #
> # External STONITH module for Xen/KVM hypervisor through ssh.
> # Uses Xen/KVM hypervisor as a STONITH device to control guest.
> #
> # Copyright (c) 2010 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> #
> 
> STOP_COMMAND="virsh destroy"
> START_COMMAND="virsh start"
> DUMP_COMMAND="virsh dump"
> SSH_COMMAND="/usr/bin/ssh -q -x -n"
> 
> # Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> hostlist=`echo $hostlist | tr ',' ' '`
> 
> CheckIfDead() {
>     for j in 1 2 3 4 5
>     do
>         if ! ping -w1 -c1 "$1" >/dev/null 2>&1
>         then
>             return 0
>         fi
>         sleep 1
>     done
>     
>     return 1
> }

I'd rather have this one removed. If virsh returns a meaningful
exit code, then we can rely on that, right? Or use domstate?

> CheckHostList() {
>     if [ "x" = "x$hostlist" ]
>     then
>         ha_log.sh err "hostlist isn't set"
>         exit 1
>     fi
> }
> 
> CheckHypervisor() {
>     if [ "x" = "x$hypervisor" ]
>     then
>         ha_log.sh err "hypervisor isn't set"
>         exit 1
>     fi
> }
> 
> RunCommand() {
>     CheckHostList
>     CheckHypervisor
>     
>     for node in $hostlist
>     do
>         if [ "$node" != "$1" ]
>         then
>             continue
>         fi

You can abbreviate code like this into a more compact

        [ "$node" != "$1" ] && continue

>         case $2 in
>             stop)
>                 if [ "x$run_dump" != "x" ]
>                 then
>                     #Need to run core dump
>                     if [ "x$dump_dir" != "x" ]
>                     then
>                         TIMESTAMP=`date +%Y-%m%d-%H%M.%S`
>                         DOMAINNAME=`printf "%s" $node`
>                         COREFILE=$dump_dir/$TIMESTAMP-$DOMAINNAME.core
>                         #Run core dump
>                         command_result="$($SSH_COMMAND $hypervisor " \
> pgrep -f ^virsh_$node >/dev/null 2>&1; \
> if [ \$? = 0 ]; then echo RUNNING; exit; fi; \
> mkdir -p $dump_dir; \
> if [ \$? != 0 ]; then echo MKDIR_FAILED; exit; fi; \
> (exec -a virsh_$node $DUMP_COMMAND $node $COREFILE >/dev/null 2>&1); \
> if [ \$? != 0 ]; then echo DUMP_FAILED; exit; fi; \
> echo OK")"
>                         ssh_result=$?
>                         if [ $ssh_result = 0 ]
>                         then
>                             case "$command_result" in
>                                 RUNNING)
>                                     ha_log.sh info "Dump is already running"
>                                     exit 0
>                                     ;;
>                                 SUSPEND_FAILED)
>                                     ha_log.sh err "Failed to suspend domain 
> $node"
>                                     ;;
>                                 MKDIR_FAILED)
>                                     ha_log.sh err "Failed to create directory 
> $dump_dir"
>                                     ;;
>                                 DUMP_FAILED)
>                                     ha_log.sh err "Failed to core dump domain 
> $node to $COREFILE"
>                                     ;;
>                                 OK)
>                                     ha_log.sh notice "Domain $node dumped to 
> $COREFILE"
>                                     ;;
>                             esac
>                         else
>                             ha_log.sh err "Couldn't connect to hypervisor 
> $hypervisor"
>                         fi
>                     else
>                         ha_log.sh err "dump_dir isn't set"
>                     fi
>                 fi
> 
>                 command_result=$($SSH_COMMAND $hypervisor "((sleep 2; 
> $STOP_COMMAND $node) >/dev/null 2>&1 &); echo \$?")
>                 ssh_result=$?
>                 if [ $ssh_result = 0 ]
>                 then
>                     if [ $command_result = 0 ]
>                     then
>                         ha_log.sh notice "Domain $node is stoped"
>                     else
>                         ha_log.sh err "Failed to stop domain $node"
>                     fi
>                 else
>                     ha_log.sh err "Couldn't connect to hypervisor $hypervisor"
>                 fi
>                 break;;
>             start)
>                 command_result=$($SSH_COMMAND $hypervisor "((sleep 2; 
> $START_COMMAND $node) >/dev/null 2>&1 &); echo \$?")
>                 ssh_result=$?
>                 if [ $ssh_result = 0 ]
>                 then
>                     if [ $command_result = 0 ]
>                     then
>                         ha_log.sh notice "Domain $node is started"
>                     else
>                         ha_log.sh err "Failed to start domain $node"
>                     fi
>                 else
>                     ha_log.sh err "Couldn't connect to hypervisor $hypervisor"
>                 fi
>                 break;;
>         esac
>         exit 0
>     done
> }

This function is too long. Can you try to refactor and split it
into several. Also, the stop and start good candidates to be
folded into one function.

> # Main code
> 
> case $1 in
> gethosts)
>     CheckHostList
>     
>     for node in $hostlist ; do
>         echo $node
>     done
>     exit 0
>     ;;
> on)
>     RunCommand $2 start
>     exit $?
>     ;;
> off)
>     if RunCommand $2 stop
>     then
>         if CheckIfDead $2
>         then 
>             exit 0
>         fi
>     fi
>            
>     exit 1
>     ;;
> reset)
>     RunCommand $2 stop
>         
>     if CheckIfDead $2
>     then
>         RunCommand $2 start 
>         exit 0
>     fi
> 
>     exit 1
>     ;;
> status)
>     exit 0
>     ;;
> getconfignames)
>     echo "hostlist hypervisor"
>     exit 0
>     ;;
> getinfo-devid)
>     echo "virsh STONITH device"
>     exit 0
>     ;;
> getinfo-devname)
>     echo "virsh STONITH external device"
>     exit 0
>     ;;
> getinfo-devdescr)
>     echo "ssh-based Linux host reset for Xen/KVM guest domain trough 
> hypervisor"
>     echo "Fine for testing, but not really suitable for production!"

Well, it could be used in production too, if you replace the
ping part.

>     exit 0
>     ;;
> getinfo-devurl)
>     echo "http://openssh.org http://www.xensource.com/ 
> http://linux-ha.org/wiki";

I think that this should be reduced to just xensource.

>     exit 0
>     ;;
> getinfo-xml)
>     cat << SSHXML
> <parameters>
> <parameter name="hostlist" unique="1" required="1">
> <content type="string" />
> <shortdesc lang="en">
> Hostlist
> </shortdesc>
> <longdesc lang="en">
> The list of controlled nodes.
> For example: "node1 node2"
> </longdesc>
> </parameter>
> <parameter name="hypervisor" unique="1" required="1">
> <content type="string" />
> <shortdesc lang="en">
> Hypervisor hostname
> </shortdesc>
> <longdesc lang="en">
> Host name to execute hypervisor. Root user shall be able to ssh to that node.

Using public key authentication. This may also be a security
issue, but let's leave that to users to decide.

> </longdesc>
> </parameter>
> <parameter name="run_dump" unique="0" required="0">
> <content type="string" />
> <shortdesc lang="en">
> Run core dump
> </shortdesc>
> <longdesc lang="en">
> If set plugin will call "virsh dump" before killing guest domain
> </longdesc>
> </parameter>
> <parameter name="dump_dir" unique="1" required="0">
> <content type="string" />
> <shortdesc lang="en">
> Run dump core with the specified directory 
> When the "run_dump" parameter is set, this parameter is indispensable 
> </shortdesc>
> <longdesc lang="en">
> This parameter can indicate the dump destination.
> Should be set as a full path format, ex.) "/var/log/dump"
> The above example would dump the core, like;
> /var/log/dump/2009-0316-1403.37-GuestDomain.core
> </longdesc>
> </parameter>
> </parameters>
> SSHXML
>     exit 0
>     ;;
> *)
>     exit 1
>     ;;
> esac

Cheers,

Dejan
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to