On Wed, Dec 22, 2010 at 06:44:34PM +0100, Holger Teutsch wrote: > On Wed, 2010-12-22 at 17:27 +0100, Dejan Muhamedagic wrote: > > On Wed, Dec 22, 2010 at 02:57:53PM +0100, Holger Teutsch wrote: > > > On Wed, 2010-12-22 at 10:37 +0100, Dejan Muhamedagic wrote: > > > > On Wed, Dec 22, 2010 at 09:57:40AM +0100, Holger Teutsch wrote: > > > > > On Tue, 2010-12-21 at 19:03 +0100, Dejan Muhamedagic wrote: > > > > > > Hi, > > > > > > > > > > > > On Tue, Dec 21, 2010 at 05:30:52PM +0100, Holger Teutsch wrote: > > > > > > > Hi, > > > > > > > I would like to submit a libvirt based stonith plugin for review > > > > > > > and > > > > > > > possible inclusion to glue. > > > > > > > The plugin uses the client of libvirtd (i.e. virsh) _in the > > > > > > > virtual > > > > > > > machines_ and connects remotely to libvirtd on the hypervisor. > > > > > > > Therefore is works with whatever transport or hypervisor that > > > > > > > libvirt > > > > > > > supports or will support. > > > > > > > > > > > > Just a note that the reset command should try to boot the host in > > > > > > case it was down too. No objections here to the rest of the code. > > > > > > > > > > As a data center guy I would not expect this. In particular when > > > > > startup > > > > > fencing comes into play. > > > > > When I _power down_ a cluster member for good reasons and start only > > > > > one > > > > > node I would not like the other one to be powered on automatigically. > > > > > The power switch is the ultimate thing we control all this stuff > > > > > > > > If you want to keep the node down why not use the poweroff action > > > > for stonith? > > > > > > > > > > Unfortunately libvirt has no state "powered on / not running" or > > > "persistent power off". > > > I'm pretty sure that e.g HP's ilo/ipmi implementation of "reset" would > > > not power on but would be ignored on a powered off machine. So that > > > might not be an issue with "real" servers. > > > > riloe and ipmi do pay attention to the power state and act > > correspondingly, that is turn power on if the host was powered > > off and reset otherwise. > > > > > With a previous version of the script on my KVM test cluster startup > > > fencing of pacemaker powered on a stopped machine and I think that is > > > not what you want. > > > > Well, that's what STONITH requires and that's how all other > > stonith plugins behave. > > OK, I comment out the logic. > Be it a pacemaker or a stonith problem: From a data center operations > perspective I consider this behavior absolutely strange. You really have > to pull the power cords to be sure that powered off servers stay off.
I see your point. Wasn't around when STONITH policy was designed, so I really can't say if they considered this issue. However, it does sound logical to me that reset should try to bring the host up. It could be that the host went down accidentally as well (due to say a kernel panic). Anyway, IMO the best practice is that the cluster stack is not started automatically on boot. > > > > > > Any chance to support more than one host? > > > > > > > > > > I reasoned about this as well but as we can not assume 'host name' == > > > > > 'domain id' that means domain_id has to be a list as well (with > > > > > defaults > > > > > or partial defaults). > > > > > > > > IIRC, there was one stonith agent which does this kind of > > > > mapping. Alternatively, perhaps drop domain_id and allow > > > > appending it in the hostlist (as in external/xen0), i.e. > > > > "node1[:domain_id] ...". > > > > > > > > > I will think again about feasability with not overcomplicated code. > > > > > > > > This should reduce the configuration, so I think it's worth the > > > > effort. > > > > > > Will go with your proposal. > > > > Great. > > > > Cheers, > > > > Dejan > > > > The updated version: Looks good to me. Can you please attach the script instead. Cheers, Dejan > - holger > > #!/bin/sh > # > # External STONITH module for a libvirt managed hypervisor (kvm/Xen). > # Uses libvirt as a STONITH device to control guest. > # > # Copyright (c) 2010 Holger Teutsch <[email protected]> > # > # This program is free software; you can redistribute it and/or modify > # it under the terms of version 2 of the GNU General Public License as > # published by the Free Software Foundation. > # > # This program is distributed in the hope that it would be useful, but > # WITHOUT ANY WARRANTY; without even the implied warranty of > # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > # > # Further, this software is distributed without any warranty that it is > # free of the rightful claim of any third person regarding infringement > # or the like. Any license provided herein, whether implied or > # otherwise, applies only to this software file. Patent licenses, if > # any, provided herein do not apply to combinations of this program with > # other software, or any other product whatsoever. > # > # You should have received a copy of the GNU General Public License > # along with this program; if not, write the Free Software Foundation, Inc., > # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > # > > # start a domain > libvirt_start() { > out=$($VIRSH -c $hypervisor_uri start $domain_id 2>&1) > if [ $? -eq 0 ] > then > ha_log.sh notice "Domain $domain_id was started" > return 0 > fi > > if echo "$out" | grep -i 'Domain is already active' > /dev/null 2>&1 > then > ha_log.sh notice "Domain $domain_id is already active" > return 0 > fi > > ha_log.sh err "Failed to start domain $domain_id" > ha_log.sh err "$out" > return 1 > } > > # stop a domain > # return > # 0: success > # 1: error > # 2: was already stopped > libvirt_stop() { > out=$($VIRSH -c $hypervisor_uri destroy $domain_id 2>&1) > if [ $? -eq 0 ] > then > ha_log.sh notice "Domain $domain_id was stopped" > return 0 > fi > > if echo "$out" | grep -i 'domain is not running' > /dev/null 2>&1 > then > ha_log.sh notice "Domain $domain_id is already stopped" > return 2 > fi > > ha_log.sh err "Failed to stop domain $domain_id" > ha_log.sh err "$out" > return 1 > } > > # get status of stonith device (*NOT* of the domain). > # If we can retrieve some info from the hypervisor > # the stonith device is OK. > libvirt_status() { > out=$($VIRSH -c $hypervisor_uri version 2>&1) > if [ $? -eq 0 ] > then > out=`echo "$out" | tail -1` > ha_log.sh notice "$hypervisor_uri: $out" > return 0 > fi > > ha_log.sh err "Failed to get status for $hypervisor_uri" > ha_log.sh err "$out" > return 1 > } > > # check config and set variables > # does not return on error > libvirt_check_config() { > VIRSH=`which virsh 2>/dev/null` > > if [ ! -x "$VIRSH" ] > then > ha_log.sh err "virsh not installed" > exit 1 > fi > > if [ -z "$hostlist" -o -z "$hypervisor_uri" ] > then > ha_log.sh err "hostlist or hypervisor_uri missing; check > configuration" > exit 1 > fi > } > > # set variable domain_id for the host specified as arg > libvirt_set_domain_id () > { > for h in $hostlist > do > case $h in > $1:*) > domain_id=`expr $h : '.*:\(.*\)'` > return > ;; > > $1) > domain_id=$1 > return > esac > done > > ha_log.sh err "Should never happen: Called for host $1 but $1 is not in > $hostlist." > exit 1 > } > > libvirt_info() { > cat << LVIRTXML > <parameters> > <parameter name="hostlist" unique="1" required="1"> > <content type="string" /> > <shortdesc lang="en"> > List of hostname[:domain_id].. > </shortdesc> > <longdesc lang="en"> > List of controlled hosts: hostname[:domain_id].. > The optional domain_id defaults to the hostname. > </longdesc> > </parameter> > > <parameter name="hypervisor_uri" required="1"> > <content type="string" /> > <shortdesc lang="en"> > Hypervisor URI > </shortdesc> > <longdesc lang="en"> > URI for connection to the hypervisor. > driver[+transport]://[usern...@][hostlist][:port]/[path][?extraparameters] > e.g. > qemu+ssh://my_kvm_server.mydomain.my/system (uses ssh for root) > xen://my_kvm_server.mydomain.my/ (uses TLS for client) > > virsh must be installed (e.g. libvir-client package) and access control must > be configured for your selected URI. > </longdesc> > </parameter> > </parameters> > LVIRTXML > exit 0 > } > > ############# > # Main code # > ############# > > # don't fool yourself when testing with stonith(8) > # and transport ssh > unset SSH_AUTH_SOCK > > # support , as a separator as well > hostlist=`echo $hostlist| sed -e 's/,/ /g'` > > case $1 in > gethosts) > hostnames=`echo $hostlist|sed -e 's/:[^: ]*//g'` > for h in $hostnames > do > echo $h > done > exit 0 > ;; > > on) > libvirt_check_config > libvirt_set_domain_id $2 > > libvirt_start > exit $? > ;; > > off) > libvirt_check_config > libvirt_set_domain_id $2 > > libvirt_stop > [ $? = 1 ] && exit 1 > exit 0 > ;; > > reset) > # libvirt has no reset so we do a power cycle > libvirt_check_config > libvirt_set_domain_id $2 > > libvirt_stop > rc=$? > [ $rc = 1 ] && exit 1 > > # stonith reset seems to require a power on even if it was off > # before so the next line is commented out > # [ $rc = 2 ] && exit 0 > > sleep 2 > libvirt_start > exit $? > ;; > > status) > libvirt_check_config > libvirt_status > exit $? > ;; > > getconfignames) > echo "hostlist hypervisor_uri" > exit 0 > ;; > > getinfo-devid) > echo "libvirt STONITH device" > exit 0 > ;; > > getinfo-devname) > echo "libvirt STONITH external device" > exit 0 > ;; > > getinfo-devdescr) > echo "libvirt-based Linux host reset for Xen/KVM guest domain through > hypervisor" > exit 0 > ;; > > getinfo-devurl) > echo "http://libvirt.org/uri.html http://linux-ha.org/wiki" > exit 0 > ;; > > getinfo-xml) > libvirt_info > echo 0; > ;; > > *) > exit 1 > ;; > esac > > > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
