Re: [ClusterLabs] fence_vmware_soap: fail to shutdown VMs
Thanks a lot for your reply Marek. Both fence-agents-common and fence-agents-vmware-soap are at version 4.0.11-27. I tried to add --power-timeout but it doesn't matter how long I set the power timeout, it always fails after about 4 seconds. If I add -v I end up with *a lot* of output (~93MB) which mostly consist of xml. I am thinking this is not the kind of output that should be expected. Anyway I tried to look for the name of my VM in the logs but it doesn't even appear once. Here are the first 50 lines of the logs: ## # head -n 50 fence-vmware-log.xml Delay 0 second(s) before logging in to the fence device reading wsdl at: https://10.5.200.20:443/sdk/vimService.wsdl ... opening (https://10.5.200.20:443/sdk/vimService.wsdl) http://schemas.xmlsoap.org/wsdl/; xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/; xmlns:interface="urn:vim25" > https://localhost/sdk/vimService; /> sax duration: 1 (ms) warning: tns (urn:vim25Service), not mapped to prefix importing (vim.wsdl) reading wsdl at: https://10.5.200.20:443/sdk/vim.wsdl ... opening (https://10.5.200.20:443/sdk/vim.wsdl) http://schemas.xmlsoap.org/wsdl/; xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/; xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/; xmlns:vim25="urn:vim25" xmlns:xsd="http://www.w3.org/2001/XMLSchema; > http://www.w3.org/2001/XMLSchema; xmlns:vim25="urn:vim25" xmlns:xsd="http://www.w3.org/2001/XMLSchema; xmlns:reflect="urn:reflect" elementFormDefault="qualified" > schemaLocation="reflect-messagetypes.xsd" /> ## With -v, the error I get at the end of the logs is: "Unable to connect/login to fencing device" which is weird since I can get the status of a VM without issue... Could it be something I forgot to install on my machine (a library or something else)? I also thought about permissions issues but I am using the default root user and I can shutdown VM through vSphere with it. Ideas about that issue are more than welcome :) Kevin On 07/04/2016 02:09 PM, Marek Grac wrote: Hi, you can try to raise value of --power-timeout from default (20 seconds), also you can add -v to have verbose output. As long as you have same version of fence-agents-common and fence-agents-vmware, there should be no issues. m, On Fri, Jul 1, 2016 at 11:31 AM, Kevin THIERRY <kevin.thierry.cit...@gmail.com <mailto:kevin.thierry.cit...@gmail.com>> wrote: Hello ! I'm trying to fence my nodes using fence_vmware_soap but it fails to shutdown or reboot my VMs. I can get the list of the VMs on a host or query the status of a specific VM without problem: # fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure -4 -n laa-billing-backup -o status /usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning) Status: ON However, trying to shutdown or to reboot a VM fails: # fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure -4 -n laa-billing-backup -o reboot /usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning) Failed: Timed out waiting to power OFF On the ESXi I get the following logs in /var/log/hostd.log: [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0): Accepted password for user root from 10.5.200.12 2016-07-01T08:49:50.911Z info hostd[34380B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=47defdf1] Event 190 : User root@10.5.200.12 <mailto:root@10.5.200.12> logged in as python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64 2016-07-01T08:49:50.998Z info hostd[32F80B70] [Originator@6876 sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Created : haTask--vim.SearchIndex.findByUuid-2513 2016-07-01T08:49:50.999Z info hostd[32F80B70] [Originator@6876 sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Completed : haTask--vim.SearchIndex.findByUuid-2513 Status success 2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 sub=Solo.Vmomi opID=47defdf6 user=root] Activation [N5Vmomi10ActivationE:0x34603c28] : Invoke done [powerOff] on [vim.VirtualMachine:3] 2016-07-01T08:49:
[ClusterLabs] fence_vmware_soap: fail to shutdown VMs
Hello ! I'm trying to fence my nodes using fence_vmware_soap but it fails to shutdown or reboot my VMs. I can get the list of the VMs on a host or query the status of a specific VM without problem: # fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure -4 -n laa-billing-backup -o status /usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning) Status: ON However, trying to shutdown or to reboot a VM fails: # fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure -4 -n laa-billing-backup -o reboot /usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html InsecureRequestWarning) Failed: Timed out waiting to power OFF On the ESXi I get the following logs in /var/log/hostd.log: [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0): Accepted password for user root from 10.5.200.12 2016-07-01T08:49:50.911Z info hostd[34380B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=47defdf1] Event 190 : User root@10.5.200.12 logged in as python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64 2016-07-01T08:49:50.998Z info hostd[32F80B70] [Originator@6876 sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Created : haTask--vim.SearchIndex.findByUuid-2513 2016-07-01T08:49:50.999Z info hostd[32F80B70] [Originator@6876 sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Completed : haTask--vim.SearchIndex.findByUuid-2513 Status success 2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 sub=Solo.Vmomi opID=47defdf6 user=root] Activation [N5Vmomi10ActivationE:0x34603c28] : Invoke done [powerOff] on [vim.VirtualMachine:3] 2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 sub=Solo.Vmomi opID=47defdf6 user=root] Throw vim.fault.RestrictedVersion 2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 sub=Solo.Vmomi opID=47defdf6 user=root] Result: --> (vim.fault.RestrictedVersion) { -->faultCause = (vmodl.MethodFault) null, -->msg = "" --> } 2016-07-01T08:49:51.027Z info hostd[34380B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=47defdf7 user=root] Event 191 : User root@10.5.200.12 logged out (login time: Friday, 01 July, 2016 08:49:50, number of API invocations: 0, user agent: python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64) I am wondering if there is some kind of compatibility issue. I am using fence-agents-vmware-soap 4.0.11 on CentOS 7.2.1511 and ESXi 6.0.0 Build 2494585. Any ideas about that issue? Best regards, -- Kevin THIERRY IT System Engineer CIT Lao Ltd. – A.T.M. PO Box 10082 Vientiane Capital – Lao P.D.R. Cell : +856 (0)20 2221 8623 kevin.thierry.cit...@gmail.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD: both nodes stuck in secondary mode
I found out why it didn't work, fail-count for resource httpd was at INFINITY: # crm_simulate -sL [...] value="INFINITY"/> [...] To reset the fail-count: # pcs resource failcount reset httpd Now it works as expected :) Next step: configure STONISH ! Thanks again ! Kevin On 06/15/2016 02:18 PM, Kevin THIERRY wrote: Hello ! I wasn't expecting such great explanations, thanks a lot Ken ! Also thank you for your example Dimitri ! It solved the issue I had ! I'm still having trouble though: when I make the primary node fail (I unplug it), the secondary node starts everything as expected but the httpd service. However when I plug back the primary node, all resources go back to it and everything works fine, even the httpd service. Starting the httpd server manually on the second node works fine (using systemctl). I suspect an issue with the allocation scores but I don't know how to solve it. # grep httpd /var/log/cluster/corosync.log Jun 15 11:58:27 [27192] laa-billing-backuppengine: warning: unpack_rsc_op_failure:Processing failed op start for httpd on billing-backup-sync: unknown error (1) Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: native_print:httpd(ocf::heartbeat:apache):Started billing-primary-sync Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: get_failcount_full:httpd has failed INFINITY times on billing-backup-sync Jun 15 11:58:27 [27192] laa-billing-backuppengine: warning: common_apply_stickiness:Forcing httpd away from billing-backup-sync after 100 failures (max=100) Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: rsc_merge_weights:vip: Rolling back scores from httpd Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: rsc_merge_weights:drbd:1: Rolling back scores from httpd Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: rsc_merge_weights:drbd-master: Rolling back scores from httpd Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: rsc_merge_weights:fs: Rolling back scores from httpd Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: rsc_merge_weights:pgsql: Rolling back scores from httpd Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: native_color:Resource http cannot run anywhere Jun 15 11:58:27 [27192] laa-billing-backuppengine: notice: LogActions:Stophttpd(billing-primary-sync) Jun 15 11:58:27 [27193] laa-billing-backup crmd: notice: te_rsc_command:Initiating action 54: stop httpd_stop_0 on billing-primary-sync Jun 15 11:58:29 [27188] laa-billing-backupcib: info: cib_perform_op:+ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='httpd']/lrm_rsc_op[@id='httpd_last_0']: @operation_key=httpd_stop_0, @operation=stop, @transition-key=54:75:0:4fdfd97f-42dd-4f04-bf86-294d300df414, @transition-magic=0:0;54:75:0:4fdfd97f-42dd-4f04-bf86-294d300df414, @call-id=166, @last-run=1465966707, @last-rc-change=1465966707, @exec-time=2118 Jun 15 11:58:29 [27193] laa-billing-backup crmd: info: match_graph_event:Action httpd_stop_0 (54) confirmed on billing-primary-sync (rc=0) On 06/14/2016 09:30 PM, Ken Gaillot wrote: # crm_simulate -sL Current cluster status: Online: [ billing-backup-sync billing-primary-sync ] vip(ocf::heartbeat:IPaddr2):Started billing-primary-sync Master/Slave Set: drbd-master [drbd] Masters: [ billing-primary-sync ] Slaves: [ billing-backup-sync ] fs(ocf::heartbeat:Filesystem):Started billing-primary-sync pgsql(ocf::heartbeat:pgsql):Started billing-primary-sync httpd(ocf::heartbeat:apache):Started billing-primary-sync Clone Set: ping-clone [ping] Started: [ billing-backup-sync billing-primary-sync ] Allocation scores: native_color: vip allocation score on billing-backup-sync: -INFINITY native_color: vip allocation score on billing-primary-sync: 400 clone_color: drbd-master allocation score on billing-backup-sync: 0 clone_color: drbd-master allocation score on billing-primary-sync: 300 clone_color: drbd:0 allocation score on billing-backup-sync: 0 clone_color: drbd:0 allocation score on billing-primary-sync: 10100 clone_color: drbd:1 allocation score on billing-backup-sync: 10100 clone_color: drbd:1 allocation score on billing-primary-sync: 0 native_color: drbd:0 allocation score on billing-backup-sync: 0 native_color: drbd:0 allocation score on billing-primary-sync: 10100 native_color: drbd:1 allocation score on billing-backup-sync: 10100 native_color: drbd:1 allocation score on billing-primary-sync: -INFINITY drbd:0 promotion score on billing-primary-sync: INFINITY drbd:1 promotion score on billing-backup-sy
Re: [ClusterLabs] DRBD: both nodes stuck in secondary mode
-sync: -INFINITY native_color: httpd allocation score on billing-primary-sync: 100 clone_color: ping-clone allocation score on billing-backup-sync: 0 clone_color: ping-clone allocation score on billing-primary-sync: 0 clone_color: ping:0 allocation score on billing-backup-sync: 0 clone_color: ping:0 allocation score on billing-primary-sync: 100 clone_color: ping:1 allocation score on billing-backup-sync: 100 clone_color: ping:1 allocation score on billing-primary-sync: 0 native_color: ping:1 allocation score on billing-backup-sync: 100 native_color: ping:1 allocation score on billing-primary-sync: 0 native_color: ping:0 allocation score on billing-backup-sync: -INFINITY native_color: ping:0 allocation score on billing-primary-sync: 100 Transition Summary: # Full config # VIP pcs cluster cib vip_cfg pcs -f vip_cfg resource create vip ocf:heartbeat:IPaddr2 \ ip=10.5.200.30 cidr_netmask=24 op monitor interval=30s pcs -f vip_cfg constraint location vip prefers billing-primary=100 pcs cluster cib-push vip_cfg # DRBD pcs cluster cib drbd_cfg pcs -f drbd_cfg resource create drbd ocf:linbit:drbd \ drbd_resource=drbd0 \ op monitor interval=29s role="Master" \ op monitor interval=31s role="Slave" pcs -f drbd_cfg resource master drbd-master drbd \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true pcs -f drbd_cfg constraint location drbd-master prefers billing-primary=100 pcs -f drbd_cfg constraint colocation add master drbd-master with vip pcs -f drbd_cfg constraint order start vip then promote drbd-master pcs cluster cib-push drbd_cfg # FS pcs cluster cib fs_cfg pcs -f fs_cfg resource create fs Filesystem \ device="/dev/drbd0" directory="/data" fstype="ext4" pcs -f fs_cfg constraint location fs prefers billing-primary=100 pcs -f fs_cfg constraint colocation add fs with drbd-master INFINITY with-rsc-role=Master pcs -f fs_cfg constraint order promote drbd-master then start fs pcs cluster cib-push fs_cfg # PGSQL pcs cluster cib pgsql_cfg pcs -f pgsql_cfg resource create pgsql pgsql \ pgctl="/usr/bin/pg_ctl" \ psql="/usr/bin/psql" \ pgdata="/data/pgsql/data/" \ node_list="billing-primary billing-backup" \ restart_on_promote='true' pcs -f pgsql_cfg constraint location pgsql prefers billing-primary=100 pcs -f pgsql_cfg constraint colocation add pgsql with fs INFINITY pcs -f pgsql_cfg constraint order start fs then start pgsql pcs cluster cib-push pgsql_cfg # HTTPD pcs cluster cib httpd_cfg pcs -f httpd_cfg resource create httpd ocf:heartbeat:apache \ configfile=/etc/httpd/conf/httpd.conf \ statusurl="http://localhost/welcome; \ op monitor interval=1min pcs -f httpd_cfg constraint location httpd prefers billing-primary=100 pcs -f httpd_cfg constraint colocation add httpd with pgsql INFINITY pcs -f httpd_cfg constraint order start pgsql then start httpd pcs cluster cib-push httpd_cfg # PING pcs cluster cib ping_cfg pcs -f ping_cfg resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list="10.5.200.254" --clone pcs -f ping_cfg constraint location vip rule score=-INFINITY pingd lt 1 or not_defined pingd pcs cluster cib-push ping_cfg -- Kevin THIERRY IT System Engineer CIT Lao Ltd. – A.T.M. PO Box 10082 Vientiane Capital – Lao P.D.R. Cell : +856 (0)20 2221 8623 kevin.thierry.cit...@gmail.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org