Re: [ClusterLabs] fence_vmware_soap: fail to shutdown VMs

2016-07-04 Thread Kevin THIERRY

Thanks a lot for your reply Marek.

Both fence-agents-common and fence-agents-vmware-soap are at version 
4.0.11-27.


I tried to add --power-timeout but it doesn't matter how long I set the 
power timeout, it always fails after about 4 seconds. If I add -v I end 
up with *a lot* of output (~93MB) which mostly consist of xml. I am 
thinking this is not the kind of output that should be expected. Anyway 
I tried to look for the name of my VM in the logs but it doesn't even 
appear once.


Here are the first 50 lines of the logs:

##

# head -n 50 fence-vmware-log.xml
Delay 0 second(s) before logging in to the fence device
reading wsdl at: https://10.5.200.20:443/sdk/vimService.wsdl ...
opening (https://10.5.200.20:443/sdk/vimService.wsdl)


http://schemas.xmlsoap.org/wsdl/;
   xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/;
   xmlns:interface="urn:vim25"
>
   
   
  
 https://localhost/sdk/vimService; />
  
   


sax duration: 1 (ms)
warning: tns (urn:vim25Service), not mapped to prefix
importing (vim.wsdl)
reading wsdl at: https://10.5.200.20:443/sdk/vim.wsdl ...
opening (https://10.5.200.20:443/sdk/vim.wsdl)


http://schemas.xmlsoap.org/wsdl/;
   xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/;
   xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/;
   xmlns:vim25="urn:vim25"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema;
>
   
  http://www.w3.org/2001/XMLSchema;
 xmlns:vim25="urn:vim25"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema;
 xmlns:reflect="urn:reflect"
 elementFormDefault="qualified"
  >
 
 
 schemaLocation="reflect-messagetypes.xsd" />

 
 

##

With -v, the error I get at the end of the logs is: "Unable to 
connect/login to fencing device" which is weird since I can get the 
status of a VM without issue...


Could it be something I forgot to install on my machine (a library or 
something else)? I also thought about permissions issues but I am using 
the default root user and I can shutdown VM through vSphere with it.


Ideas about that issue are more than welcome :)

Kevin

On 07/04/2016 02:09 PM, Marek Grac wrote:

Hi,

you can try to raise value of --power-timeout from default (20 
seconds), also you can add -v to have verbose output.


As long as you have same version of fence-agents-common and 
fence-agents-vmware, there should be no issues.


m,


On Fri, Jul 1, 2016 at 11:31 AM, Kevin THIERRY 
<kevin.thierry.cit...@gmail.com 
<mailto:kevin.thierry.cit...@gmail.com>> wrote:


Hello !

I'm trying to fence my nodes using fence_vmware_soap but it fails
to shutdown or reboot my VMs. I can get the list of the VMs on a
host or query the status of a specific VM without problem:

# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z
--ssl-insecure -4 -n laa-billing-backup -o status
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769:
InsecureRequestWarning:
Unverified HTTPS request is being made. Adding certificate
verification is strongly advised. See:
https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
Status: ON

However, trying to shutdown or to reboot a VM fails:

# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z
--ssl-insecure -4 -n laa-billing-backup -o reboot
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769:
InsecureRequestWarning: Unverified HTTPS request is being made.
Adding certificate verification is strongly advised. See:
https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
Failed: Timed out waiting to power OFF

On the ESXi I get the following logs in /var/log/hostd.log:

[LikewiseGetDomainJoinInfo:355] QueryInformation():
ERROR_FILE_NOT_FOUND (2/0):
Accepted password for user root from 10.5.200.12
2016-07-01T08:49:50.911Z info hostd[34380B70] [Originator@6876
sub=Vimsvc.ha-eventmgr opID=47defdf1] Event 190 : User
root@10.5.200.12 <mailto:root@10.5.200.12> logged in as
python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64
2016-07-01T08:49:50.998Z info hostd[32F80B70] [Originator@6876
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Created :
haTask--vim.SearchIndex.findByUuid-2513
2016-07-01T08:49:50.999Z info hostd[32F80B70] [Originator@6876
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Completed :
haTask--vim.SearchIndex.findByUuid-2513 Status success
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876
sub=Solo.Vmomi opID=47defdf6 user=root] Activation
[N5Vmomi10ActivationE:0x34603c28] : Invoke done [powerOff] on
[vim.VirtualMachine:3]
2016-07-01T08:49:

[ClusterLabs] fence_vmware_soap: fail to shutdown VMs

2016-07-01 Thread Kevin THIERRY

Hello !

I'm trying to fence my nodes using fence_vmware_soap but it fails to 
shutdown or reboot my VMs. I can get the list of the VMs on a host or 
query the status of a specific VM without problem:


# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure 
-4 -n laa-billing-backup -o status
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: 
InsecureRequestWarning:
Unverified HTTPS request is being made. Adding certificate verification 
is strongly advised. See: 
https://urllib3.readthedocs.org/en/latest/security.html

  InsecureRequestWarning)
Status: ON

However, trying to shutdown or to reboot a VM fails:

# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure 
-4 -n laa-billing-backup -o reboot
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: 
InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
certificate verification is strongly advised. See: 
https://urllib3.readthedocs.org/en/latest/security.html

  InsecureRequestWarning)
Failed: Timed out waiting to power OFF

On the ESXi I get the following logs in /var/log/hostd.log:

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND 
(2/0):

Accepted password for user root from 10.5.200.12
2016-07-01T08:49:50.911Z info hostd[34380B70] [Originator@6876 
sub=Vimsvc.ha-eventmgr opID=47defdf1] Event 190 : User root@10.5.200.12 
logged in as python-requests/2.6.0 CPython/2.7.5 
Linux/3.10.0-327.18.2.el7.x86_64
2016-07-01T08:49:50.998Z info hostd[32F80B70] [Originator@6876 
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Created : 
haTask--vim.SearchIndex.findByUuid-2513
2016-07-01T08:49:50.999Z info hostd[32F80B70] [Originator@6876 
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Completed : 
haTask--vim.SearchIndex.findByUuid-2513 Status success
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Activation 
[N5Vmomi10ActivationE:0x34603c28] : Invoke done [powerOff] on 
[vim.VirtualMachine:3]
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Throw vim.fault.RestrictedVersion
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Result:

--> (vim.fault.RestrictedVersion) {
-->faultCause = (vmodl.MethodFault) null,
-->msg = ""
--> }
2016-07-01T08:49:51.027Z info hostd[34380B70] [Originator@6876 
sub=Vimsvc.ha-eventmgr opID=47defdf7 user=root] Event 191 : User 
root@10.5.200.12 logged out (login time: Friday, 01 July, 2016 08:49:50, 
number of API invocations: 0, user agent: python-requests/2.6.0 
CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64)



I am wondering if there is some kind of compatibility issue. I am using 
fence-agents-vmware-soap 4.0.11 on CentOS 7.2.1511 and ESXi 6.0.0 Build 
2494585.

Any ideas about that issue?

Best regards,

--
Kevin THIERRY
IT System Engineer

CIT Lao Ltd. – A.T.M.
PO Box 10082
Vientiane Capital – Lao P.D.R.
Cell : +856 (0)20 2221 8623
kevin.thierry.cit...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DRBD: both nodes stuck in secondary mode

2016-06-15 Thread Kevin THIERRY
I found out why it didn't work, fail-count for resource httpd was at 
INFINITY:


# crm_simulate -sL
[...]
value="INFINITY"/>

[...]

To reset the fail-count:
# pcs resource failcount reset httpd

Now it works as expected :)

Next step: configure STONISH !

Thanks again !
Kevin

On 06/15/2016 02:18 PM, Kevin THIERRY wrote:

Hello !

I wasn't expecting such great explanations, thanks a lot Ken ! Also 
thank you for your example Dimitri ! It solved the issue I had !


I'm still having trouble though: when I make the primary node fail (I 
unplug it), the secondary node starts everything as expected but the 
httpd service. However when I plug back the primary node, all 
resources go back to it and everything works fine, even the httpd 
service.


Starting the httpd server manually on the second node works fine 
(using systemctl). I suspect an issue with the allocation scores but I 
don't know how to solve it.



# grep httpd /var/log/cluster/corosync.log


Jun 15 11:58:27 [27192] laa-billing-backuppengine:  warning: 
unpack_rsc_op_failure:Processing failed op start for httpd on 
billing-backup-sync: unknown error (1)
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
native_print:httpd(ocf::heartbeat:apache):Started 
billing-primary-sync
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
get_failcount_full:httpd has failed INFINITY times on 
billing-backup-sync
Jun 15 11:58:27 [27192] laa-billing-backuppengine:  warning: 
common_apply_stickiness:Forcing httpd away from 
billing-backup-sync after 100 failures (max=100)
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
rsc_merge_weights:vip: Rolling back scores from httpd
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
rsc_merge_weights:drbd:1: Rolling back scores from httpd
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
rsc_merge_weights:drbd-master: Rolling back scores from httpd
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
rsc_merge_weights:fs: Rolling back scores from httpd
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
rsc_merge_weights:pgsql: Rolling back scores from httpd
Jun 15 11:58:27 [27192] laa-billing-backuppengine: info: 
native_color:Resource http cannot run anywhere
Jun 15 11:58:27 [27192] laa-billing-backuppengine:   notice: 
LogActions:Stophttpd(billing-primary-sync)
Jun 15 11:58:27 [27193] laa-billing-backup   crmd:   notice: 
te_rsc_command:Initiating action 54: stop httpd_stop_0 on 
billing-primary-sync
Jun 15 11:58:29 [27188] laa-billing-backupcib: info: 
cib_perform_op:+ 
/cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='httpd']/lrm_rsc_op[@id='httpd_last_0']: 
@operation_key=httpd_stop_0, @operation=stop, 
@transition-key=54:75:0:4fdfd97f-42dd-4f04-bf86-294d300df414, 
@transition-magic=0:0;54:75:0:4fdfd97f-42dd-4f04-bf86-294d300df414, 
@call-id=166, @last-run=1465966707, @last-rc-change=1465966707, 
@exec-time=2118
Jun 15 11:58:29 [27193] laa-billing-backup   crmd: info: 
match_graph_event:Action httpd_stop_0 (54) confirmed on 
billing-primary-sync (rc=0)

On 06/14/2016 09:30 PM, Ken Gaillot wrote:


# crm_simulate -sL


Current cluster status:
Online: [ billing-backup-sync billing-primary-sync ]

 vip(ocf::heartbeat:IPaddr2):Started billing-primary-sync
 Master/Slave Set: drbd-master [drbd]
 Masters: [ billing-primary-sync ]
 Slaves: [ billing-backup-sync ]
 fs(ocf::heartbeat:Filesystem):Started billing-primary-sync
 pgsql(ocf::heartbeat:pgsql):Started billing-primary-sync
 httpd(ocf::heartbeat:apache):Started billing-primary-sync
 Clone Set: ping-clone [ping]
 Started: [ billing-backup-sync billing-primary-sync ]

Allocation scores:
native_color: vip allocation score on billing-backup-sync: -INFINITY
native_color: vip allocation score on billing-primary-sync: 400
clone_color: drbd-master allocation score on billing-backup-sync: 0
clone_color: drbd-master allocation score on billing-primary-sync: 300
clone_color: drbd:0 allocation score on billing-backup-sync: 0
clone_color: drbd:0 allocation score on billing-primary-sync: 10100
clone_color: drbd:1 allocation score on billing-backup-sync: 10100
clone_color: drbd:1 allocation score on billing-primary-sync: 0
native_color: drbd:0 allocation score on billing-backup-sync: 0
native_color: drbd:0 allocation score on billing-primary-sync: 10100
native_color: drbd:1 allocation score on billing-backup-sync: 10100
native_color: drbd:1 allocation score on billing-primary-sync: -INFINITY
drbd:0 promotion score on billing-primary-sync: INFINITY
drbd:1 promotion score on billing-backup-sy

Re: [ClusterLabs] DRBD: both nodes stuck in secondary mode

2016-06-15 Thread Kevin THIERRY
-sync: -INFINITY
native_color: httpd allocation score on billing-primary-sync: 100
clone_color: ping-clone allocation score on billing-backup-sync: 0
clone_color: ping-clone allocation score on billing-primary-sync: 0
clone_color: ping:0 allocation score on billing-backup-sync: 0
clone_color: ping:0 allocation score on billing-primary-sync: 100
clone_color: ping:1 allocation score on billing-backup-sync: 100
clone_color: ping:1 allocation score on billing-primary-sync: 0
native_color: ping:1 allocation score on billing-backup-sync: 100
native_color: ping:1 allocation score on billing-primary-sync: 0
native_color: ping:0 allocation score on billing-backup-sync: -INFINITY
native_color: ping:0 allocation score on billing-primary-sync: 100

Transition Summary:


# Full config


# VIP
pcs cluster cib vip_cfg
pcs -f vip_cfg resource create vip ocf:heartbeat:IPaddr2 \
ip=10.5.200.30 cidr_netmask=24 op monitor interval=30s
pcs -f vip_cfg constraint location vip prefers billing-primary=100
pcs cluster cib-push vip_cfg

# DRBD
pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create drbd ocf:linbit:drbd \
drbd_resource=drbd0 \
op monitor interval=29s role="Master" \
op monitor interval=31s role="Slave"
pcs -f drbd_cfg resource master drbd-master drbd \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs -f drbd_cfg constraint location drbd-master prefers billing-primary=100
pcs -f drbd_cfg constraint colocation add master drbd-master with vip
pcs -f drbd_cfg constraint order start vip then promote drbd-master
pcs cluster cib-push drbd_cfg

# FS
pcs cluster cib fs_cfg
pcs -f fs_cfg resource create fs Filesystem \
device="/dev/drbd0" directory="/data" fstype="ext4"
pcs -f fs_cfg constraint location fs prefers billing-primary=100
pcs -f fs_cfg constraint colocation add fs with drbd-master INFINITY 
with-rsc-role=Master

pcs -f fs_cfg constraint order promote drbd-master then start fs
pcs cluster cib-push fs_cfg

# PGSQL
pcs cluster cib pgsql_cfg
pcs -f pgsql_cfg resource create pgsql pgsql \
pgctl="/usr/bin/pg_ctl" \
psql="/usr/bin/psql" \
pgdata="/data/pgsql/data/" \
node_list="billing-primary billing-backup" \
restart_on_promote='true'
pcs -f pgsql_cfg constraint location pgsql prefers billing-primary=100
pcs -f pgsql_cfg constraint colocation add pgsql with fs INFINITY
pcs -f pgsql_cfg constraint order start fs then start pgsql
pcs cluster cib-push pgsql_cfg

# HTTPD
pcs cluster cib httpd_cfg
pcs -f httpd_cfg resource create httpd ocf:heartbeat:apache  \
configfile=/etc/httpd/conf/httpd.conf \
statusurl="http://localhost/welcome; \
op monitor interval=1min
pcs -f httpd_cfg constraint location httpd prefers billing-primary=100
pcs -f httpd_cfg constraint colocation add httpd with pgsql INFINITY
pcs -f httpd_cfg constraint order start pgsql then start httpd
pcs cluster cib-push httpd_cfg

# PING
pcs cluster cib ping_cfg
pcs -f ping_cfg resource create ping ocf:pacemaker:ping dampen=5s 
multiplier=1000 host_list="10.5.200.254" --clone
pcs -f ping_cfg constraint location vip rule score=-INFINITY pingd lt 1 
or not_defined pingd

pcs cluster cib-push ping_cfg

--
Kevin THIERRY
IT System Engineer

CIT Lao Ltd. – A.T.M.
PO Box 10082
Vientiane Capital – Lao P.D.R.
Cell : +856 (0)20 2221 8623
kevin.thierry.cit...@gmail.com


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org