As requested by digimer in linux-ha irc channel here there is fence_ovh. It's
not a priority that it's included by default in official distribution of
cluster software but if you guide me on how to polish it I think I can improve
it a lot more and make tests in real machines (as long as my machines are still
test machines and not production ones).
1) What is fence_ovh
fence_ovh is a fence agent based on python for the big French datacentre
provider OVH. You can get information about OVH on: http://www.ovh.co.uk/ . I
also wanted to make clear that I'm not part of official OVH staff.
2) Features
The script has two main functions:
* Reboot into rescue mode (action=off)
* Reboot into the hard disk (action=on;action=reboot)
3) Technical details
So as you might deduce the classical fence mechanism which turns off the other
node is not actually done by turning off the machine but by rebooting it into a
rescue mode.
Another particular thing to mention is that the script checks if the machine
has rebooted ok into rescue mode thanks to an OVH API which reports the date
when the server rebooted. By the way the OVH API is also used in the main
function that consists in rebooting the machine into rescue mode.
4) How to use it
4.1) Make sure python-soappy package is installed (Debian/Ubuntu).
4.2) Save fence_ovh in /usr/sbin
4.3) Run: ccs_update_schema so that new metadata is put into cluster.rng
4.4) If needed validate your configuration:
ccs_config_validate -v -f /etc/pve/cluster.conf.new
4.5) Here's an example of how to use it in cluster.conf:
<?xml version="1.0"?>
<cluster name="ha-008-010" config_version="3">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"
two_node="1" expected_votes="1">
</cman>
<fencedevices>
<fencedevice agent="fence_ovh" name="fence008" email="[email protected]"
ipaddr="ns123456" login="ab12345-ovh" passwd="MYSECRET" />
<fencedevice agent="fence_ovh" name="fence010" email="[email protected]"
ipaddr="ns789012" login="ab12345-ovh" passwd="MYSECRET" />
</fencedevices>
<clusternodes>
<clusternode name="server008" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fence008" action="off"/>
</method>
</fence>
</clusternode>
<clusternode name="server010" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fence010" action="off"/>
</method>
</fence>
</clusternode>
</clusternodes>
</cluster>
Finally I attach to this email the first version of ovh_fence script. It can be
improved a lot, I've just realised that I've left some mention an .ini file in
the metadata that I had used previously to feed user / pass while now they are
gathered from cluster.conf configuration directly as any fence agent.
The original thread from Proxmox forum from which I adapted original secofor
script:
http://forum.proxmox.com/threads/11066-Proxmox-HA-Cluster-at-OVH-Fencing?p=75152#post75152
P.S.: It was not easy to develop a fence agent because there's no documentation
on it. I maybe arise another email in this same mailing list about this
subject.
--
--
Adrián Gibanel
I.T. Manager
+34 675 683 301
www.btactic.com
Ens podeu seguir a/Nos podeis seguir en:
i
Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és
cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El
medio ambiente es cosa de todos.
AVIS:
El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou
el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o
copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge
per error, us agrairem que ho feu saber immediatament al remitent i que
procediu a destruir el missatge .
AVISO:
El contenido de este mensaje y de sus anexos es confidencial. Si no es el
destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o
copiarlo sin tener la autorización correspondiente. Si han recibido este
mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al
remitente y que procedan a destruir el mensaje .
#!/usr/bin/python
# Copyright 2013 Adrian Gibanel Lopez (bTactic)
# Adrian Gibanel improved this script
# at 2013 to add verification of success
# and to output metadata
# Based on:
# This is a fence agent for use at OVH
# As there are no other fence devices available,
# we must use OVH's SOAP API #Quick-and-dirty
# assemled by Dennis Busch, secofor GmbH,
# Germany
# This work is licensed under a
# Creative Commons Attribution-ShareAlike 3.0 Unported License.
# Manual call parametres example
#
# login=ab12345-ovh
# passwd=MYSECRET
# email=admin@myadmin
# ipaddr=ns12345
# action=off
# # where ipaddr is your server's OVH name
import sys, re, pexpect
sys.path.append("/usr/share/fence")
from fencing import *
import sys
from SOAPpy import WSDL
import time
from datetime import datetime
OVH_RESCUE_PRO_NETBOOT_ID='28'
OVH_HARD_DISK_NETBOOT_ID='1'
STATUS_HARD_DISK_SLEEP=240 # Wait 4 minutes to SO to boot
STATUS_RESCUE_PRO_SLEEP=150 # Wait 2 minutes 30 seconds to Rescue-Pro to run
OVH_FENCE_DEBUG=False # True or False for debug
def netboot_reboot(nodeovh,login,passwd,email,mode):
soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
session = soap.login(login, passwd, 'es', 0)
#dedicatedNetbootModifyById changes the mode of the next reboot
result = soap.dedicatedNetbootModifyById(session, nodeovh, mode, '', email)
#dedicatedHardRebootDo initiates a hard reboot on the given node
soap.dedicatedHardRebootDo(session, nodeovh, 'Fencing initiated by cluster', '', 'es')
soap.logout(session)
def reboot_status(nodeovh,login,passwd):
soap = WSDL.Proxy('https://www.ovh.com/soapi/soapi-re-1.59.wsdl')
session = soap.login(login, passwd, 'es', 0)
result = soap.dedicatedHardRebootStatus(session, nodeovh)
tmpstart = datetime.strptime(result.start,'%Y-%m-%d %H:%M:%S')
tmpend = datetime.strptime(result.end,'%Y-%m-%d %H:%M:%S')
result.start = tmpstart
result.end = tmpend
soap.logout(session)
return result
#print stderr to file
save_stderr = sys.stderr
errlog = open("/var/log/fence_ovh_error.log","a")
sys.stderr = errlog
global all_opt
device_opt = [ "email", "ipaddr", "action" , "login" , "passwd"]
ovh_fence_opt = {
"email" : {
"getopt" : "Z:",
"longopt" : "email",
"help" : "-Z, --email=<email> email for reboot message: [email protected]",
"required" : "1",
"shortdesc" : "Reboot email",
"default" : "",
"order" : 1 },
}
all_opt.update(ovh_fence_opt)
all_opt["ipaddr"]["shortdesc"] = "OVH node name"
atexit.register(atexit_handler)
options=check_input(device_opt,process_input(device_opt))
# Not sure if I need this old notation
## Support for -n [switch]:[plug] notation that was used before
if ((options.has_key("-n")) and (-1 != options["-n"].find(":"))):
(switch, plug) = options["-n"].split(":", 1)
if ((switch.isdigit()) and (plug.isdigit())):
options["-s"] = switch
options["-n"] = plug
if (not (options.has_key("-s"))):
options["-s"]="1"
docs = { }
docs["shortdesc"] = "Fence agent for OVH"
docs["longdesc"] = "fence_ovh is an Power Fencing agent \
which can be used within OVH datecentre. \
Poweroff is simulated with a reboot into rescue-pro \
mode. \
/usr/local/etc/ovhsecret example: \
\
[OVH] \
Login = ab12345-ovh \
Passwd = MYSECRET \
"
docs["vendorurl"] = "http://www.ovh.net"
show_docs(options, docs)
#I use a own logfile for debugging purpose
if OVH_FENCE_DEBUG:
logfile=open("/var/log/fence_ovh.log", "a");
logfile.write(time.strftime("\n%d.%m.%Y %H:%M:%S \n"))
logfile.write("Parameter:\t")
for val in sys.argv:
logfile.write(val + " ")
logfile.write("\n")
action=options['-o']
email=options['-Z']
login=options['-l']
passwd=options['-p']
nodeovh=options['-a']
if nodeovh[-8:] != '.ovh.net':
nodeovh += '.ovh.net'
# Save datetime just before changing netboot
before_netboot_reboot = datetime.now()
if action == 'off':
netboot_reboot(nodeovh,login,passwd,email,OVH_RESCUE_PRO_NETBOOT_ID) #Reboot in Rescue-pro
elif action == 'on':
netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
elif action == 'reboot':
netboot_reboot(nodeovh,login,passwd,email,OVH_HARD_DISK_NETBOOT_ID) #Reboot from HD
else:
if OVH_FENCE_DEBUG:
logfile.write("nothing to do\n")
logfile.close()
errlog.close()
sys.exit()
if action == 'off':
time.sleep(STATUS_RESCUE_PRO_SLEEP) #Reboot in vKVM
elif action == 'on':
time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
elif action == 'reboot':
time.sleep(STATUS_HARD_DISK_SLEEP) #Reboot from HD
else:
if OVH_FENCE_DEBUG:
logfile.write("No sense! Check script please!\n")
logfile.close()
errlog.close()
sys.exit()
after_netboot_reboot = datetime.now()
# Verification of success
reboot_start_end=reboot_status(nodeovh,login,passwd)
if OVH_FENCE_DEBUG:
logfile.write("reboot_start_end.start: " +reboot_start_end.start.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("before_netboot_reboot: " +before_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("reboot_start_end.end: " +reboot_start_end.end.strftime('%Y-%m-%d %H:%M:%S')+"\n")
logfile.write("after_netboot_reboot: " +after_netboot_reboot.strftime('%Y-%m-%d %H:%M:%S')+"\n")
if ((reboot_start_end.start > before_netboot_reboot) and (reboot_start_end.end < after_netboot_reboot)):
if OVH_FENCE_DEBUG:
logfile.write("Netboot reboot went OK.\n")
else:
if OVH_FENCE_DEBUG:
logfile.write("ERROR: Netboot reboot wasn't OK.\n")
logfile.close()
errlog.close()
sys.exit(1)
if OVH_FENCE_DEBUG:
logfile.close()
errlog.close()
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster