[ClusterLabs] Antw: question about equal resource distribution

2017-02-16 Thread Ulrich Windl
>>> Ilia Sokolinski  schrieb am 17.02.2017 um 07:30 in
Nachricht <28de945e-894f-41b0-b191-53ce90542...@clearskydata.com>:
> Suppose I have a N node cluster where N > 2 running m*N resources. Resources

> don’t have preferred nodes, but since resources take RAM and CPU it is 
> important to distribute them equally among the nodes.
> Will pacemaker do the equal distribution, e.g. m resources per node?

Yes, per primitive, but does not take into account any usage (unless you tell
him to do so).

> If a node fails, will pacemaker redistribute the resources equally too, e.g.

> m * N/(N-1) per node?

If you have a 3-node cluster, and one node fails, the dead resources will be
distributed equally between the remaining nodes (like said above).

> 
> I don’t see any settings controlling this behavior in the documentation,
but 
> perhaps, pacemaker tries to be “fair” by default.

"Utilization" is a good keyword to search. Despite of that there's a German
saying "Probieren geht über studieren", meaning "trying is preferrable over
studying". Sometimes, at least ;-)

Regards,
Ulrich

> 
> Thanks 
> 
> Ilia Sokolinski
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Disabled resource is hard logging

2017-02-16 Thread Ulrich Windl
>>> Oscar Segarra  schrieb am 16.02.2017 um 13:55 in
Nachricht
:
> Hi Klaus,
> 
> Thanks a lot, I will try to delete the stop monitor.
> 
> Nevertheless, I have 6 domains configured exactly the same... Is there any
> reason why just this domain has this behaviour ?

Some years ago I was playing with NPIV, and it worked perfectly for one and
for several VMs. However when multiple VMs were started or stopped at the same
time (this NPIV added/removed), I had "interesting" failures due to
concurrency, even a kernel lockup (which is fixed meanwhile). So most likely
"something is not correct".
I know it doesn't help you the way you would like, but it's how life is.

Regards,
Ulrich

> 
> Thanks a lot.
> 
> 2017-02-16 11:12 GMT+01:00 Klaus Wenninger :
> 
>> On 02/16/2017 11:02 AM, Oscar Segarra wrote:
>> > Hi Kaluss
>> >
>> > Which is your proposal to fix this behavior?
>>
>> First you can try to remove the monitor op for role=stopped.
>> Then the startup-probing will probably still fail but for that
>> the behaviour is different.
>> The startup-probing can be disabled globally via cluster-property
>> enable-startup-probes that is defaulting to true.
>> But be aware that the cluster then wouldn't be able to react
>> properly if services are already up when pacemaker is starting.
>> It should be possible to disable the probing on a per resource
>> or node basis as well iirc. But I can't tell you out of my mind
>> how that worked - there was a discussion a few weeks ago
>> on the list iirc.
>>
>> Regards,
>> Klaus
>>
>> >
>> > Thanks a lot!
>> >
>> >
>> > El 16 feb. 2017 10:57 a. m., "Klaus Wenninger" > > > escribió:
>> >
>> > On 02/16/2017 09:05 AM, Oscar Segarra wrote:
>> > > Hi,
>> > >
>> > > In my environment I have deployed 5 VirtualDomains as one can
>> > see below:
>> > > [root@vdicnode01 ~]# pcs status
>> > > Cluster name: vdic-cluster
>> > > Stack: corosync
>> > > Current DC: vdicnode01-priv (version 1.1.15-11.el7_3.2-e174ec8) -
>> > > partition with quorum
>> > > Last updated: Thu Feb 16 09:02:53 2017  Last change: Thu
>> Feb
>> > > 16 08:20:53 2017 by root via crm_attribute on vdicnode02-priv
>> > >
>> > > 2 nodes and 14 resources configured: 5 resources DISABLED and 0
>> > > BLOCKED from being started due to failures
>> > >
>> > > Online: [ vdicnode01-priv vdicnode02-priv ]
>> > >
>> > > Full list of resources:
>> > >
>> > >  nfs-vdic-mgmt-vm-vip   (ocf::heartbeat:IPaddr):Started
>> > > vdicnode01-priv
>> > >  Clone Set: nfs_setup-clone [nfs_setup]
>> > >  Started: [ vdicnode01-priv vdicnode02-priv ]
>> > >  Clone Set: nfs-mon-clone [nfs-mon]
>> > >  Started: [ vdicnode01-priv vdicnode02-priv ]
>> > >  Clone Set: nfs-grace-clone [nfs-grace]
>> > >  Started: [ vdicnode01-priv vdicnode02-priv ]
>> > >  vm-vdicone01   (ocf::heartbeat:VirtualDomain): FAILED (disabled)[
>> > > vdicnode02-priv vdicnode01-priv ]
>> > >  vm-vdicsunstone01  (ocf::heartbeat:VirtualDomain): FAILED
>> > > vdicnode01-priv (disabled)
>> > >  vm-vdicdb01(ocf::heartbeat:VirtualDomain): FAILED (disabled)[
>> > > vdicnode02-priv vdicnode01-priv ]
>> > >  vm-vdicudsserver   (ocf::heartbeat:VirtualDomain): FAILED
>> > > (disabled)[ vdicnode02-priv vdicnode01-priv ]
>> > >  vm-vdicudstuneler  (ocf::heartbeat:VirtualDomain): FAILED
>> > > vdicnode01-priv (disabled)
>> > >  Clone Set: nfs-vdic-images-vip-clone [nfs-vdic-images-vip]
>> > >  Stopped: [ vdicnode01-priv vdicnode02-priv ]
>> > >
>> > > Failed Actions:
>> > > * vm-vdicone01_monitor_2 on vdicnode02-priv 'not installed'
>> (5):
>> > > call=2322, status=complete, exitreason='Configuration file
>> > > /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not
>> > readable.',
>> > > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms,
>> exec=21ms
>> > > * vm-vdicsunstone01_monitor_2 on vdicnode02-priv 'not
>> installed'
>> > > (5): call=2310, status=complete, exitreason='Configuration file
>> > > /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
>> > > readable.',
>> > > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms,
>> exec=37ms
>> > > * vm-vdicdb01_monitor_2 on vdicnode02-priv 'not installed'
(5):
>> > > call=2320, status=complete, exitreason='Configuration file
>> > > /mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not
>> > readable.',
>> > > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms,
>> exec=35ms
>> > > * vm-vdicudsserver_monitor_2 on vdicnode02-priv 'not
installed'
>> > > (5): call=2321, status=complete, exitreason='Configuration file
>> > > 

[ClusterLabs] question about equal resource distribution

2017-02-16 Thread Ilia Sokolinski
Suppose I have a N node cluster where N > 2 running m*N resources. Resources 
don’t have preferred nodes, but since resources take RAM and CPU it is 
important to distribute them equally among the nodes.
Will pacemaker do the equal distribution, e.g. m resources per node?
If a node fails, will pacemaker redistribute the resources equally too, e.g. m 
* N/(N-1) per node?

I don’t see any settings controlling this behavior in the documentation, but 
perhaps, pacemaker tries to be “fair” by default.

Thanks 

Ilia Sokolinski
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] 答复: Re: 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-16 Thread he.hailong5
adding "sleep 5" before return in the stop func fixed the issue. so I suspect 
there must be concurrency bug somewhere in the code. just FYI.














原始邮件



发件人: <kgail...@redhat.com>
收件人:何海龙10164561
抄送人: <users@clusterlabs.org>
日 期 :2017年02月15日 23:22
主 题 :Re: 答复: Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail





On 02/15/2017 03:57 AM, he.hailo...@zte.com.cn wrote:
> I just tried using colocation, it dosen't work.
> 
> 
> I failed the node paas-controller-3, but sdclient_vip didn't get moved:

The colocation would work, but the problem you're having with router and
apigateway is preventing it from getting that far. In other words,
router and apigateway are still running on the node (they have not been
successfully stopped), so the colocation is still valid.

I suspect that the return codes from your custom resource agents may be
the issue. Make sure that your agents conform to these guidelines:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

In particular, "start" should not return until a monitor operation would
return success, "stop" should not return until a monitor would return
"not running", and "monitor" should return "not running" if called on a
host where the service hasn't started yet. Be sure you are returning the
proper OCF_* codes according to the table in the link above.

If the documentation is unclear, please ask here about anything you are
unsure of.

> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  router (ocf::heartbeat:router):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  apigateway (ocf::heartbeat:apigateway):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
> 
> here is the configuration:
> 
> >crm configure show
> 
> node $id="336855579" paas-controller-1
> 
> node $id="336855580" paas-controller-2
> 
> node $id="336855581" paas-controller-3
> 
> primitive apigateway ocf:heartbeat:apigateway \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="20.20.2.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive router ocf:heartbeat:router \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive router_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive sdclient ocf:heartbeat:sdclient \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive sdclient_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.8" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> clone apigateway_rep apigateway
> 
> clone router_rep router
> 
> clone sdclient_rep sdclient
> 
> colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started
> 
> colocation router_colo +inf: router_vip router_rep:Started
> 
> colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.10-42f2063" \
> 
> cluster-infrastructure="corosync" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="stop" \
> 
> start-failure-is-fatal="false" \
> 
> last-lrm-refresh="1486981647"
> 
> op_defaults $id="op_defaults-options" \
> 
> on-fail="restart"
> 
> 
> 
> 
> 
> 原始邮件
> *发件人:*何海龙10164561
> *收件人:*<kgail...@redhat.com>
> *抄送人:*<users@clusterlabs.org>
> *日 期 :*2017年02月15日 10:54
> *主 题 :**答复: Re: 答复: Re: [ClusterLabs] clone resource not get
> restarted on fail*
> 
> 
> Is 

[ClusterLabs] Ansible modules for basic operation with pacemaker cluster with 'pcs'

2017-02-16 Thread Ondrej Famera

Hi Everyone,

I have developed several ansible modules for interacting with pacemaker 
cluster using 'pcs' utility. The modules cover enough to create cluster, 
authorize nodes and add/delete/update resource in it (idempotency 
included - for example resources are updated only if they differ).


  pcs-modules-2
  https://galaxy.ansible.com/OndrejHome/pcs-modules-2/

Further I have also created ansible role for setting up pacemkaer 
cluster on CentOS/RHEL 6/7 including the fencing setup for fence_xvm on 
which I test it mostly and ability to specify your own fencing devices 
if desired.


  ha-cluster-pacemaker
  https://galaxy.ansible.com/OndrejHome/ha-cluster-pacemaker/

The role was showcased on this years DevConf 2017 in for of workshop. 
Links for materials and recording can be found in blog post below.


  https://www.famera.cz/blog/computers/devconf-2017.html

Enjoy and if you hit any any issues feel free to open issue on Github.

--
Ondrej Faměra

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] I question whether STONITH is working.

2017-02-16 Thread durwin
Klaus Wenninger  wrote on 02/16/2017 10:43:19 AM:

> From: Klaus Wenninger 
> To: dur...@mgtsciences.com, Cluster Labs - All topics related to 
> open-source clustering welcomed 
> Cc: kgail...@redhat.com
> Date: 02/16/2017 10:43 AM
> Subject: Re: [ClusterLabs] I question whether STONITH is working.
> 
> On 02/16/2017 05:42 PM, dur...@mgtsciences.com wrote:
> Klaus Wenninger  wrote on 02/16/2017 03:27:07 AM:
> 
> > From: Klaus Wenninger  
> > To: kgail...@redhat.com, Cluster Labs - All topics related to open-
> > source clustering welcomed  
> > Date: 02/16/2017 03:27 AM 
> > Subject: Re: [ClusterLabs] I question whether STONITH is working. 
> > 
> > On 02/15/2017 10:30 PM, Ken Gaillot wrote:
> > > On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
> > >> I have 2 Fedora VMs (node1, and node2) running on a Windows 10 
machine
> > >> using Virtualbox.
> > >>
> > >> I began with this.
> > >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/
> > Clusters_from_Scratch/
> > >>
> > >>
> > >> When it came to fencing, I refered to this.
> > >> http://www.linux-ha.org/wiki/SBD_Fencing
> > >>
> > >> To the file /etc/sysconfig/sbd I added these lines.
> > >> SBD_OPTS="-W"
> > >> SBD_DEVICE="/dev/sdb1"
> > >> I added 'modprobe softdog' to rc.local
> > >>
> > >> After getting sbd working, I resumed with Clusters from Scratch, 
chapter
> > >> 8.3.
> > >> I executed these commands *only* one node1.  Am I suppose to run 
any of
> > >> these commands on other nodes? 'Clusters from Scratch' does not 
specify.
> > > Configuration commands only need to be run once. The cluster
> > > synchronizes all changes across the cluster.
> > >
> > >> pcs cluster cib stonith_cfg
> > >> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
> > >> devices="/dev/sdb1" port="node2"
> > > The above command creates a fence device configured to kill node2 -- 
but
> > > it doesn't tell the cluster which nodes the device can be used to 
kill.
> > > Thus, even if you try to fence node1, it will use this device, and 
node2
> > > will be shot.
> > >
> > > The pcmk_host_list parameter specifies which nodes the device can 
kill.
> > > If not specified, the device will be used to kill any node. So, just 
add
> > > pcmk_host_list=node2 here.
> > >
> > > You'll need to configure a separate device to fence node1.
> > >
> > > I haven't used fence_sbd, so I don't know if there's a way to 
configure
> > > it as one device that can kill both nodes.
> > 
> > fence_sbd should return a proper dynamic-list.
> > So without ports and host-list it should just work fine.
> > Not even a host-map should be needed. Or actually it is not
> > supported because if sbd is using different node-naming than
> > pacemaker, pacemaker-watcher within sbd is gonna fail. 
> 
> I am not clear on what you are conveying.  On the command 
> 'pcs -f stonith_cfg stonith create' I do not need the port= option?
> 
> e.g. 'pcs stonith create FenceSBD fence_sbd devices="/dev/vdb"'
> should do the whole trick.

Thank you.  Since I already executed this command, executing it again
without device= says device already exists.  What is correct way to
remove current device so I can create it again without device=?

Durwin

> 
> 
> Ken stated I need an sbd device for each node in the cluster 
> (needing fencing). 
> I assume each node is a possible failure and would need fencing. 
> So what *is* a slot?  SBD device allocates 255 slots in each device. 
> These slots are not to keep track of the nodes?
> 
> There is a slot for each node - and if the sbd-instance doesn't find
> one matching
> its own name it creates one (paints one of the 255 that is unused 
> with its own name).
> The slots are used to send messages to the sbd-instances on the nodes.

> 
> 
> Regarding fence_sbd returning dynamic-list.  The command 
> 'sbd -d /dev/sdb1 list' returns every node in the cluster. 
> Is this the list you are referring to?
> 
> Yes and no. fence_sbd - fence-agent is using the same command to create 
that
> list when it is asked by pacemaker which nodes it is able to fence.
> So you don't have to hardcode that, although you can of course using a
> host-map if you don't want sbd-fencing to be used for certain nodes 
because
> you might have a better fencing device (can be solved using 
fencing-levels
> as well).

> 
> 
> Thank you, 
> 
> Durwin 
> 
> > 
> > >
> > >> pcs -f stonith_cfg property set stonith-enabled=true
> > >> pcs cluster cib-push stonith_cfg
> > >>
> > >> I then tried this command from node1.
> > >> stonith_admin --reboot node2
> > >>
> > >> Node2 did not reboot or even shutdown. the command 'sbd -d 
/dev/sdb1
> > >> list' showed node2 as off, but I was still logged into it (cluster
> > >> status on node2 showed not running).
> > >>
> > >> I rebooted and ran this command on node 2 and started cluster.
> > >> sbd -d /dev/sdb1 message node2 clear
> > 

Re: [ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Klaus Wenninger
On 02/16/2017 05:12 PM, Oscar Segarra wrote:
> Sorry, In the other node I get exactly the same log entries:
>
> VirtualDomain(vm-vdicone01)[125890]:2017/02/16_17:11:45 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[125912]:2017/02/16_17:11:45 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[125933]:2017/02/16_17:11:45 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[125955]:2017/02/16_17:11:45 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[125976]:2017/02/16_17:11:45 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[125998]:2017/02/16_17:11:45 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126019]:2017/02/16_17:11:45 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126041]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126062]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126084]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126105]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126127]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126148]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126170]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126191]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126213]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126234]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126256]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126278]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126300]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126321]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126343]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126364]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126386]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126407]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126429]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126450]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126472]:2017/02/16_17:11:46 INFO:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
> resource considered stopped.
> VirtualDomain(vm-vdicone01)[126493]:2017/02/16_17:11:46 ERROR:
> Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist
> or is not readable.
> VirtualDomain(vm-vdicone01)[126515]:2017/02/16_17:11:46 

Re: [ClusterLabs] I question whether STONITH is working.

2017-02-16 Thread Klaus Wenninger
On 02/16/2017 05:42 PM, dur...@mgtsciences.com wrote:
> Klaus Wenninger  wrote on 02/16/2017 03:27:07 AM:
>
> > From: Klaus Wenninger 
> > To: kgail...@redhat.com, Cluster Labs - All topics related to open-
> > source clustering welcomed 
> > Date: 02/16/2017 03:27 AM
> > Subject: Re: [ClusterLabs] I question whether STONITH is working.
> >
> > On 02/15/2017 10:30 PM, Ken Gaillot wrote:
> > > On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
> > >> I have 2 Fedora VMs (node1, and node2) running on a Windows 10
> machine
> > >> using Virtualbox.
> > >>
> > >> I began with this.
> > >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/
> > Clusters_from_Scratch/
> > >>
> > >>
> > >> When it came to fencing, I refered to this.
> > >> http://www.linux-ha.org/wiki/SBD_Fencing
> > >>
> > >> To the file /etc/sysconfig/sbd I added these lines.
> > >> SBD_OPTS="-W"
> > >> SBD_DEVICE="/dev/sdb1"
> > >> I added 'modprobe softdog' to rc.local
> > >>
> > >> After getting sbd working, I resumed with Clusters from Scratch,
> chapter
> > >> 8.3.
> > >> I executed these commands *only* one node1.  Am I suppose to run
> any of
> > >> these commands on other nodes? 'Clusters from Scratch' does not
> specify.
> > > Configuration commands only need to be run once. The cluster
> > > synchronizes all changes across the cluster.
> > >
> > >> pcs cluster cib stonith_cfg
> > >> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
> > >> devices="/dev/sdb1" port="node2"
> > > The above command creates a fence device configured to kill node2
> -- but
> > > it doesn't tell the cluster which nodes the device can be used to
> kill.
> > > Thus, even if you try to fence node1, it will use this device, and
> node2
> > > will be shot.
> > >
> > > The pcmk_host_list parameter specifies which nodes the device can
> kill.
> > > If not specified, the device will be used to kill any node. So,
> just add
> > > pcmk_host_list=node2 here.
> > >
> > > You'll need to configure a separate device to fence node1.
> > >
> > > I haven't used fence_sbd, so I don't know if there's a way to
> configure
> > > it as one device that can kill both nodes.
> >
> > fence_sbd should return a proper dynamic-list.
> > So without ports and host-list it should just work fine.
> > Not even a host-map should be needed. Or actually it is not
> > supported because if sbd is using different node-naming than
> > pacemaker, pacemaker-watcher within sbd is gonna fail.
>
> I am not clear on what you are conveying.  On the command
> 'pcs -f stonith_cfg stonith create' I do not need the port= option?

e.g. 'pcs stonith create FenceSBD fence_sbd devices="/dev/vdb"'
should do the whole trick.

>
>
> Ken stated I need an sbd device for each node in the cluster (needing
> fencing).
> I assume each node is a possible failure and would need fencing.
> So what *is* a slot?  SBD device allocates 255 slots in each device.
> These slots are not to keep track of the nodes?

There is a slot for each node - and if the sbd-instance doesn't find one
matching
its own name it creates one (paints one of the 255 that is unused with
its own name).
The slots are used to send messages to the sbd-instances on the nodes.

>
>
> Regarding fence_sbd returning dynamic-list.  The command
> 'sbd -d /dev/sdb1 list' returns every node in the cluster.
> Is this the list you are referring to?

Yes and no. fence_sbd - fence-agent is using the same command to create that
list when it is asked by pacemaker which nodes it is able to fence.
So you don't have to hardcode that, although you can of course using a
host-map if you don't want sbd-fencing to be used for certain nodes because
you might have a better fencing device (can be solved using fencing-levels
as well).

>
>
> Thank you,
>
> Durwin
>
> >
> > >
> > >> pcs -f stonith_cfg property set stonith-enabled=true
> > >> pcs cluster cib-push stonith_cfg
> > >>
> > >> I then tried this command from node1.
> > >> stonith_admin --reboot node2
> > >>
> > >> Node2 did not reboot or even shutdown. the command 'sbd -d /dev/sdb1
> > >> list' showed node2 as off, but I was still logged into it (cluster
> > >> status on node2 showed not running).
> > >>
> > >> I rebooted and ran this command on node 2 and started cluster.
> > >> sbd -d /dev/sdb1 message node2 clear
> > >>
> > >> If I ran this command on node2, node2 rebooted.
> > >> stonith_admin --reboot node1
> > >>
> > >> What have I missed or done wrong?
> > >>
> > >>
> > >> Thank you,
> > >>
> > >> Durwin F. De La Rue
> > >> Management Sciences, Inc.
> > >> 6022 Constitution Ave. NE
> > >> Albuquerque, NM  87110
> > >> Phone (505) 255-8611
> > >
> > > ___
> > > Users mailing list: Users@clusterlabs.org
> > > http://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > Project Home: http://www.clusterlabs.org 
> > > Getting started:
> 

Re: [ClusterLabs] I question whether STONITH is working.

2017-02-16 Thread durwin
Klaus Wenninger  wrote on 02/16/2017 03:27:07 AM:

> From: Klaus Wenninger 
> To: kgail...@redhat.com, Cluster Labs - All topics related to open-
> source clustering welcomed 
> Date: 02/16/2017 03:27 AM
> Subject: Re: [ClusterLabs] I question whether STONITH is working.
> 
> On 02/15/2017 10:30 PM, Ken Gaillot wrote:
> > On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
> >> I have 2 Fedora VMs (node1, and node2) running on a Windows 10 
machine
> >> using Virtualbox.
> >>
> >> I began with this.
> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/
> Clusters_from_Scratch/
> >>
> >>
> >> When it came to fencing, I refered to this.
> >> http://www.linux-ha.org/wiki/SBD_Fencing
> >>
> >> To the file /etc/sysconfig/sbd I added these lines.
> >> SBD_OPTS="-W"
> >> SBD_DEVICE="/dev/sdb1"
> >> I added 'modprobe softdog' to rc.local
> >>
> >> After getting sbd working, I resumed with Clusters from Scratch, 
chapter
> >> 8.3.
> >> I executed these commands *only* one node1.  Am I suppose to run any 
of
> >> these commands on other nodes? 'Clusters from Scratch' does not 
specify.
> > Configuration commands only need to be run once. The cluster
> > synchronizes all changes across the cluster.
> >
> >> pcs cluster cib stonith_cfg
> >> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
> >> devices="/dev/sdb1" port="node2"
> > The above command creates a fence device configured to kill node2 -- 
but
> > it doesn't tell the cluster which nodes the device can be used to 
kill.
> > Thus, even if you try to fence node1, it will use this device, and 
node2
> > will be shot.
> >
> > The pcmk_host_list parameter specifies which nodes the device can 
kill.
> > If not specified, the device will be used to kill any node. So, just 
add
> > pcmk_host_list=node2 here.
> >
> > You'll need to configure a separate device to fence node1.
> >
> > I haven't used fence_sbd, so I don't know if there's a way to 
configure
> > it as one device that can kill both nodes.
> 
> fence_sbd should return a proper dynamic-list.
> So without ports and host-list it should just work fine.
> Not even a host-map should be needed. Or actually it is not
> supported because if sbd is using different node-naming than
> pacemaker, pacemaker-watcher within sbd is gonna fail.

I am not clear on what you are conveying.  On the command
'pcs -f stonith_cfg stonith create' I do not need the port= option?

Ken stated I need an sbd device for each node in the cluster (needing 
fencing).
I assume each node is a possible failure and would need fencing.
So what *is* a slot?  SBD device allocates 255 slots in each device.
These slots are not to keep track of the nodes?

Regarding fence_sbd returning dynamic-list.  The command
'sbd -d /dev/sdb1 list' returns every node in the cluster.
Is this the list you are referring to?

Thank you,

Durwin

> 
> >
> >> pcs -f stonith_cfg property set stonith-enabled=true
> >> pcs cluster cib-push stonith_cfg
> >>
> >> I then tried this command from node1.
> >> stonith_admin --reboot node2
> >>
> >> Node2 did not reboot or even shutdown. the command 'sbd -d /dev/sdb1
> >> list' showed node2 as off, but I was still logged into it (cluster
> >> status on node2 showed not running).
> >>
> >> I rebooted and ran this command on node 2 and started cluster.
> >> sbd -d /dev/sdb1 message node2 clear
> >>
> >> If I ran this command on node2, node2 rebooted.
> >> stonith_admin --reboot node1
> >>
> >> What have I missed or done wrong?
> >>
> >>
> >> Thank you,
> >>
> >> Durwin F. De La Rue
> >> Management Sciences, Inc.
> >> 6022 Constitution Ave. NE
> >> Albuquerque, NM  87110
> >> Phone (505) 255-8611
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



This email message and any attachments are for the sole use of the 
intended recipient(s) and may contain proprietary and/or confidential 
information which may be privileged or otherwise protected from 
disclosure. Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient(s), please contact the 
sender by reply email and destroy the original message and any copies of 
the message as well as any attachments to the original message.___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users


Re: [ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Oscar Segarra
Sorry, In the other node I get exactly the same log entries:

VirtualDomain(vm-vdicone01)[125890]:2017/02/16_17:11:45 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[125912]:2017/02/16_17:11:45 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[125933]:2017/02/16_17:11:45 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[125955]:2017/02/16_17:11:45 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[125976]:2017/02/16_17:11:45 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[125998]:2017/02/16_17:11:45 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126019]:2017/02/16_17:11:45 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126041]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126062]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126084]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126105]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126127]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126148]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126170]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126191]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126213]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126234]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126256]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126278]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126300]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126321]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126343]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126364]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126386]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126407]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126429]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126450]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126472]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126493]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[126515]:2017/02/16_17:11:46 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[126536]:2017/02/16_17:11:46 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml 

Re: [ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Oscar Segarra
Hi Klaus,

I have delted the op stop:

pcs resource op remove vm-vdicone01 stop interval=0s timeout=90
pcs resource op remove vm-vdicdb01 stop interval=0s timeout=90
pcs resource op remove vm-vdicsunstone01 stop interval=0s timeout=90
pcs resource op remove vm-vdicudsserver stop interval=0s timeout=90
pcs resource op remove vm-vdicudstuneler stop interval=0s timeout=90

But log continues growing (now all VirtualDomains appear)

VirtualDomain(vm-vdicone01)[72872]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[72873]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[72914]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[72917]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicsunstone01)[72959]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[72958]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73000]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[73001]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicone01)[73044]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[73045]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73086]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[73089]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicone01)[73130]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[73133]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73172]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[73175]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicsunstone01)[73217]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73216]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[73259]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicone01)[73258]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[73303]:2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73302]: 2017/02/16_16:43:39 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[73344]:2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist
or is not readable.
VirtualDomain(vm-vdicone01)[73345]: 2017/02/16_16:43:39 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicone01)[73388]: 2017/02/16_16:43:40 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicsunstone01)[73389]:2017/02/16_16:43:40 INFO:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml not readable,
resource considered stopped.
VirtualDomain(vm-vdicone01)[73430]: 2017/02/16_16:43:40 ERROR:
Configuration file /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is
not readable.
VirtualDomain(vm-vdicsunstone01)[73433]:

Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

2017-02-16 Thread Ken Gaillot
On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote:
> 
> Hi all,
> 
> We are currently setting up a MySQL cluster (Master-Slave) over this
> platform:
> - Two nodes, on RHEL 7.0
> - pacemaker-1.1.10-29.el7.x86_64
> - corosync-2.3.3-2.el7.x86_64
> - pcs-0.9.115-32.el7.x86_64
> There is a IP address resource to be used as a "virtual IP".
> 
> This is configuration of cluster:
> 
> Cluster Name: webmobbdprep
> Corosync Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> Pacemaker Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> 
> Resources:
>  Group: G_MySQL_M
>   Meta Attrs: priority=100
>   Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m)
>Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>Meta Attrs: resource-stickiness=1000
>Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120)
>demote interval=0s timeout=120 (MySQL_M-demote-timeout-120)
>start interval=0s timeout=120s on-fail=restart
> (MySQL_M-start-timeout-120s-on-fail-restart)
>stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s)
>monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_M-monitor-interval-60s-timeout-30s)
>   Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32
>Meta Attrs: target-role=Started migration-threshold=3
> failure-timeout=60s
>Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
>stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
>monitor interval=60s (ClusterIP-monitor-interval-60s)
>  Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s)
>   Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>   Meta Attrs: resource-stickiness=0
>   Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120)
>   demote interval=0s timeout=120 (MySQL_S-demote-timeout-120)
>   start interval=0s timeout=120s on-fail=restart
> (MySQL_S-start-timeout-120s-on-fail-restart)
>   stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s)
>   monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_S-monitor-interval-60s-timeout-30s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start MySQL_M then start ClusterIP (Mandatory)
> (id:order-MySQL_M-ClusterIP-mandatory)
>   start G_MySQL_M then start MySQL_S (Mandatory)
> (id:order-G_MySQL_M-MySQL_S-mandatory)
> Colocation Constraints:
>   G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY)
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.10-29.el7-368c726
>  last-lrm-refresh: 1487148812
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> Pacemaker works as expected under most of situations, but there is one
> scenario that is really not understable to us. I will try to describe it:
> 
> a - Master resource (and Cluster IP address) are active on node 1 and
> Slave resource is active on node 2.
> b - We force movement of Master resource to node 2.
> c - Pacemaker stops all resources: Master, Slave and Cluster IP.
> d - Master resource and Cluster IP are started on node 2 (this is OK),
> but Slave also tries to start (??). It fails (logically, because Master
> resource has been started on the same node), it logs an "unknown error"
> and its state is marked as "failed". This is a capture of 'pcs status'
> at that point:
> 
> OFFLINE: [ webmob1bdprep-ges ]
> Online: [ webmob2bdprep-ges ]
> 
> Full list of resources:
> 
> Resource Group: G_MySQL_M
> MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges
> ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges
> MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges
> 
> Failed actions:
> MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62,
> status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms,
> exec=0ms
> MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78,
> status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms,
> exec=0ms
> 
> PCSD Status:
> webmob1bdprep-ges: Offline
> webmob2bdprep-ges: Online
> 
> e - Pacemaker moves Slave resource to node 1 and starts it. Now we have
> both resources started again, Master on node 2 and Slave on node 1.
> f - One 

Re: [ClusterLabs] Antw: Re: Antw: Re: ocf:lvm2:VolumeGroup Probe Issue

2017-02-16 Thread Eric Ren

Hi,


On 02/16/2017 08:16 PM, Ulrich Windl wrote:

[snip]

Any other advice? Is ocf:heartbeat:LVM or ocf:lvm2:VolumeGroup the
more popular RA for managing LVM VG's? Any comments from other users
on experiences using either (good, bad)?

I had a little bit experience on "ocf:heartbeat:LVM". Each volume group
needs an
independent resource agent of it. Something like:

You mean "an independent resource instance (primitive)"?

Yes, I meant it for "ocf:heartbeat:LVM". And, I cannot find
"OCF:lvm2:VolumeGroup" on
SLES.

One RA should be good for all VGs ;-)

Oh, really? if so, why "params volgrpname" is required?

You are still mixing class (RA) and object (primitive)! IMHO RA is the script 
(class) like ocf:heartbeat:LVM, while the primitive (object) is a configuration 
based on the RA. So you'd have multiple objects (primitives) of one class (RA).


Aha, my bad. It's like the concepts of "class and object" in 
object-oriented language.

Thanks for pointing it out:)

Eric


Regards,
Ulrich


"""
crm(live)configure# ra info ocf:heartbeat:LVM
...
Parameters (*: required, []: default):

volgrpname* (string): Volume group name
  The name of volume group.
"""

And I failed to show "OCF:lvm2:VolumeGroup":
"""
crm(live)configure# ra info ocf:lvm2:
ocf:lvm2:clvmd ocf:lvm2:cmirrord
"""

Am I missing something?

Thanks for your input:)
Eric

"""
primitive vg1 LVM \
   params volgrpname=vg1 exclusive=true \
   op start timeout=100 interval=0 \
   op stop timeout=40 interval=0 \
   op monitor interval=60 timeout=240
"""

And, "dlm" and "clvm" resource agents are grouped and then cloned like:
"""
group base-group dlm clvm
clone base-clone base-group \
   meta target-role=Started interleave=true
"""

Then, put an "order" constraint like:
"""
order base_first_vg1 inf: base-clone vg1
"""

Does "ocf:lvm2:VolumeGroup" can follow the same pattern?

Thanks,
Eric

Both appear to achieve the
same function, just a bit differently.


Thanks,

Marc

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org








___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 09:31, cys wrote:
> The attachment includes coredump and logs just before corosync went wrong.
> 
> The packages we use:
> corosync-2.3.4-7.el7_2.1.x86_64
> corosynclib-2.3.4-7.el7_2.1.x86_64
> libqb-0.17.1-2.el7.1.x86_64
> 
> But they are not available any more at mirror.centos.org. If you can't find 
> them anywhere, I can send you the RPMs.
> The debuginfo packages can be downloaded from 
> http://debuginfo.centos.org/7/x86_64/.
> 

Can you send me the RPMs please? I tried the RHEL ones with the same
version number but they don't work (it was worth a try!)

Thanks
Chrissie


> Unfortunately corosync was restarted yesterday, and I can't get  the blackbox 
> dump covering the day the incident occurred.
> 
> At 2017-02-16 16:00:05, "Christine Caulfield"  wrote:
>> On 16/02/17 03:51, cys wrote:
>>> At 2017-02-15 23:13:08, "Christine Caulfield"  wrote:

 Yes, it seems that some corosync SEGVs trigger this obscure bug in
 libqb. I've chased a few possible causes and none have been fruitful.

 If you get this then corosync has crashed, and this other bug is masking
 the actual diagnostics - I know, helpful :/

 It's on my list

 Chrissie

>>>
>>> Thanks.
>>> I think you have noticed that my_service_list[3] is invalid.
>>> About the SEGV, do you need additional information? coredump or logs?
>>>
>>
>> A blackbox dump and (if possible) coredump would be very useful if you
>> can get them. thank you.
>>
>> Chrissie


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: ocf:lvm2:VolumeGroup Probe Issue

2017-02-16 Thread Eric Ren

Hi Ulrich!

On 02/16/2017 03:31 PM, Ulrich Windl wrote:

Eric Ren  schrieb am 16.02.2017 um 04:50 in Nachricht

:

Hi,

On 11/09/2016 12:37 AM, Marc Smith wrote:

Hi,

First, I realize ocf:lvm2:VolumeGroup comes from the LVM2 package and
not resource-agents, but I'm hoping someone on this list is familiar
with this RA and can provide some insight.

In my cluster configuration, I'm using ocf:lvm2:VolumeGroup to manage
my LVM VG's, and I'm using the cluster to manage DLM and CLVM. I have
my constraints in place and everything seems to be working mostly,
except I'm hitting a glitch with ocf:lvm2:VolumeGroup and the initial
probe operation.

On startup, a probe operation (monitor) is issued for all of the
resources, but ocf:lvm2:VolumeGroup is returning OCF_ERR_GENERIC in
VolumeGroup_status() (via VolumeGroup_monitor()) since clvmd hasn't
started yet... this line in VolumeGroup_status() is the trouble:

VGOUT=`vgdisplay -v $OCF_RESKEY_volgrpname 2>&1` || exit $OCF_ERR_GENERIC

When clvmd is not running, 'vgdisplay -v name' will always return
something like this:

--snip--
connect() failed on local socket: No such file or directory
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
  VG name on command line not found in list of VGs: biggie
Volume group "biggie" not found
Cannot process volume group biggie
--snip--

And exits with a status of 5. So, my question is, do I patch the RA?
Or is there some cluster constraint I can add so a probe/monitor
operation isn't performed for the VolumeGroup resource until CLVM has
been started?

Any other advice? Is ocf:heartbeat:LVM or ocf:lvm2:VolumeGroup the
more popular RA for managing LVM VG's? Any comments from other users
on experiences using either (good, bad)?

I had a little bit experience on "ocf:heartbeat:LVM". Each volume group
needs an
independent resource agent of it. Something like:

You mean "an independent resource instance (primitive)"?

Yes, I meant it for "ocf:heartbeat:LVM". And, I cannot find 
"OCF:lvm2:VolumeGroup" on
SLES.

One RA should be good for all VGs ;-)

Oh, really? if so, why "params volgrpname" is required?

"""
crm(live)configure# ra info ocf:heartbeat:LVM
...
Parameters (*: required, []: default):

volgrpname* (string): Volume group name
The name of volume group.
"""

And I failed to show "OCF:lvm2:VolumeGroup":
"""
crm(live)configure# ra info ocf:lvm2:
ocf:lvm2:clvmd ocf:lvm2:cmirrord
"""

Am I missing something?

Thanks for your input:)
Eric



"""
primitive vg1 LVM \
  params volgrpname=vg1 exclusive=true \
  op start timeout=100 interval=0 \
  op stop timeout=40 interval=0 \
  op monitor interval=60 timeout=240
"""

And, "dlm" and "clvm" resource agents are grouped and then cloned like:
"""
group base-group dlm clvm
clone base-clone base-group \
  meta target-role=Started interleave=true
"""

Then, put an "order" constraint like:
"""
order base_first_vg1 inf: base-clone vg1
"""

Does "ocf:lvm2:VolumeGroup" can follow the same pattern?

Thanks,
Eric

Both appear to achieve the
same function, just a bit differently.


Thanks,

Marc

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] I question whether STONITH is working.

2017-02-16 Thread Klaus Wenninger
On 02/15/2017 10:30 PM, Ken Gaillot wrote:
> On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
>> I have 2 Fedora VMs (node1, and node2) running on a Windows 10 machine
>> using Virtualbox.
>>
>> I began with this.
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/
>>
>>
>> When it came to fencing, I refered to this.
>> http://www.linux-ha.org/wiki/SBD_Fencing
>>
>> To the file /etc/sysconfig/sbd I added these lines.
>> SBD_OPTS="-W"
>> SBD_DEVICE="/dev/sdb1"
>> I added 'modprobe softdog' to rc.local
>>
>> After getting sbd working, I resumed with Clusters from Scratch, chapter
>> 8.3.
>> I executed these commands *only* one node1.  Am I suppose to run any of
>> these commands on other nodes? 'Clusters from Scratch' does not specify.
> Configuration commands only need to be run once. The cluster
> synchronizes all changes across the cluster.
>
>> pcs cluster cib stonith_cfg
>> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
>> devices="/dev/sdb1" port="node2"
> The above command creates a fence device configured to kill node2 -- but
> it doesn't tell the cluster which nodes the device can be used to kill.
> Thus, even if you try to fence node1, it will use this device, and node2
> will be shot.
>
> The pcmk_host_list parameter specifies which nodes the device can kill.
> If not specified, the device will be used to kill any node. So, just add
> pcmk_host_list=node2 here.
>
> You'll need to configure a separate device to fence node1.
>
> I haven't used fence_sbd, so I don't know if there's a way to configure
> it as one device that can kill both nodes.

fence_sbd should return a proper dynamic-list.
So without ports and host-list it should just work fine.
Not even a host-map should be needed. Or actually it is not
supported because if sbd is using different node-naming than
pacemaker, pacemaker-watcher within sbd is gonna fail.

>
>> pcs -f stonith_cfg property set stonith-enabled=true
>> pcs cluster cib-push stonith_cfg
>>
>> I then tried this command from node1.
>> stonith_admin --reboot node2
>>
>> Node2 did not reboot or even shutdown. the command 'sbd -d /dev/sdb1
>> list' showed node2 as off, but I was still logged into it (cluster
>> status on node2 showed not running).
>>
>> I rebooted and ran this command on node 2 and started cluster.
>> sbd -d /dev/sdb1 message node2 clear
>>
>> If I ran this command on node2, node2 rebooted.
>> stonith_admin --reboot node1
>>
>> What have I missed or done wrong?
>>
>>
>> Thank you,
>>
>> Durwin F. De La Rue
>> Management Sciences, Inc.
>> 6022 Constitution Ave. NE
>> Albuquerque, NM  87110
>> Phone (505) 255-8611
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Klaus Wenninger
On 02/16/2017 11:02 AM, Oscar Segarra wrote:
> Hi Kaluss
>
> Which is your proposal to fix this behavior?

First you can try to remove the monitor op for role=stopped.
Then the startup-probing will probably still fail but for that
the behaviour is different.
The startup-probing can be disabled globally via cluster-property
enable-startup-probes that is defaulting to true.
But be aware that the cluster then wouldn't be able to react
properly if services are already up when pacemaker is starting.
It should be possible to disable the probing on a per resource
or node basis as well iirc. But I can't tell you out of my mind
how that worked - there was a discussion a few weeks ago
on the list iirc.

Regards,
Klaus

>
> Thanks a lot! 
>
>
> El 16 feb. 2017 10:57 a. m., "Klaus Wenninger"  > escribió:
>
> On 02/16/2017 09:05 AM, Oscar Segarra wrote:
> > Hi,
> >
> > In my environment I have deployed 5 VirtualDomains as one can
> see below:
> > [root@vdicnode01 ~]# pcs status
> > Cluster name: vdic-cluster
> > Stack: corosync
> > Current DC: vdicnode01-priv (version 1.1.15-11.el7_3.2-e174ec8) -
> > partition with quorum
> > Last updated: Thu Feb 16 09:02:53 2017  Last change: Thu Feb
> > 16 08:20:53 2017 by root via crm_attribute on vdicnode02-priv
> >
> > 2 nodes and 14 resources configured: 5 resources DISABLED and 0
> > BLOCKED from being started due to failures
> >
> > Online: [ vdicnode01-priv vdicnode02-priv ]
> >
> > Full list of resources:
> >
> >  nfs-vdic-mgmt-vm-vip   (ocf::heartbeat:IPaddr):Started
> > vdicnode01-priv
> >  Clone Set: nfs_setup-clone [nfs_setup]
> >  Started: [ vdicnode01-priv vdicnode02-priv ]
> >  Clone Set: nfs-mon-clone [nfs-mon]
> >  Started: [ vdicnode01-priv vdicnode02-priv ]
> >  Clone Set: nfs-grace-clone [nfs-grace]
> >  Started: [ vdicnode01-priv vdicnode02-priv ]
> >  vm-vdicone01   (ocf::heartbeat:VirtualDomain): FAILED (disabled)[
> > vdicnode02-priv vdicnode01-priv ]
> >  vm-vdicsunstone01  (ocf::heartbeat:VirtualDomain): FAILED
> > vdicnode01-priv (disabled)
> >  vm-vdicdb01(ocf::heartbeat:VirtualDomain): FAILED (disabled)[
> > vdicnode02-priv vdicnode01-priv ]
> >  vm-vdicudsserver   (ocf::heartbeat:VirtualDomain): FAILED
> > (disabled)[ vdicnode02-priv vdicnode01-priv ]
> >  vm-vdicudstuneler  (ocf::heartbeat:VirtualDomain): FAILED
> > vdicnode01-priv (disabled)
> >  Clone Set: nfs-vdic-images-vip-clone [nfs-vdic-images-vip]
> >  Stopped: [ vdicnode01-priv vdicnode02-priv ]
> >
> > Failed Actions:
> > * vm-vdicone01_monitor_2 on vdicnode02-priv 'not installed' (5):
> > call=2322, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not
> readable.',
> > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=21ms
> > * vm-vdicsunstone01_monitor_2 on vdicnode02-priv 'not installed'
> > (5): call=2310, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
> > readable.',
> > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=37ms
> > * vm-vdicdb01_monitor_2 on vdicnode02-priv 'not installed' (5):
> > call=2320, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not
> readable.',
> > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=35ms
> > * vm-vdicudsserver_monitor_2 on vdicnode02-priv 'not installed'
> > (5): call=2321, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicudsserver.xml does not exist or is not
> > readable.',
> > last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=42ms
> > * vm-vdicudstuneler_monitor_2 on vdicnode01-priv 'not installed'
> > (5): call=1987183, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicudstuneler.xml does not exist or is not
> > readable.',
> > last-rc-change='Thu Feb 16 04:00:25 2017', queued=0ms, exec=30ms
> > * vm-vdicdb01_monitor_2 on vdicnode01-priv 'not installed' (5):
> > call=2550049, status=complete, exitreason='Configuration file
> > /mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not
> readable.',
> > last-rc-change='Thu Feb 16 08:13:37 2017', queued=0ms, exec=44ms
> > * nfs-mon_monitor_1 on vdicnode01-priv 'unknown error' (1):
> > call=1984009, status=Timed Out, exitreason='none',
> > last-rc-change='Thu Feb 16 04:24:30 2017', queued=0ms, exec=0ms
> > * vm-vdicsunstone01_monitor_2 on vdicnode01-priv 'not installed'
> > (5): call=2552050, status=complete, exitreason='Configuration file
> > 

Re: [ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Oscar Segarra
Hi Kaluss

Which is your proposal to fix this behavior?

Thanks a lot!


El 16 feb. 2017 10:57 a. m., "Klaus Wenninger" 
escribió:

On 02/16/2017 09:05 AM, Oscar Segarra wrote:
> Hi,
>
> In my environment I have deployed 5 VirtualDomains as one can see below:
> [root@vdicnode01 ~]# pcs status
> Cluster name: vdic-cluster
> Stack: corosync
> Current DC: vdicnode01-priv (version 1.1.15-11.el7_3.2-e174ec8) -
> partition with quorum
> Last updated: Thu Feb 16 09:02:53 2017  Last change: Thu Feb
> 16 08:20:53 2017 by root via crm_attribute on vdicnode02-priv
>
> 2 nodes and 14 resources configured: 5 resources DISABLED and 0
> BLOCKED from being started due to failures
>
> Online: [ vdicnode01-priv vdicnode02-priv ]
>
> Full list of resources:
>
>  nfs-vdic-mgmt-vm-vip   (ocf::heartbeat:IPaddr):Started
> vdicnode01-priv
>  Clone Set: nfs_setup-clone [nfs_setup]
>  Started: [ vdicnode01-priv vdicnode02-priv ]
>  Clone Set: nfs-mon-clone [nfs-mon]
>  Started: [ vdicnode01-priv vdicnode02-priv ]
>  Clone Set: nfs-grace-clone [nfs-grace]
>  Started: [ vdicnode01-priv vdicnode02-priv ]
>  vm-vdicone01   (ocf::heartbeat:VirtualDomain): FAILED (disabled)[
> vdicnode02-priv vdicnode01-priv ]
>  vm-vdicsunstone01  (ocf::heartbeat:VirtualDomain): FAILED
> vdicnode01-priv (disabled)
>  vm-vdicdb01(ocf::heartbeat:VirtualDomain): FAILED (disabled)[
> vdicnode02-priv vdicnode01-priv ]
>  vm-vdicudsserver   (ocf::heartbeat:VirtualDomain): FAILED
> (disabled)[ vdicnode02-priv vdicnode01-priv ]
>  vm-vdicudstuneler  (ocf::heartbeat:VirtualDomain): FAILED
> vdicnode01-priv (disabled)
>  Clone Set: nfs-vdic-images-vip-clone [nfs-vdic-images-vip]
>  Stopped: [ vdicnode01-priv vdicnode02-priv ]
>
> Failed Actions:
> * vm-vdicone01_monitor_2 on vdicnode02-priv 'not installed' (5):
> call=2322, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not readable.',
> last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=21ms
> * vm-vdicsunstone01_monitor_2 on vdicnode02-priv 'not installed'
> (5): call=2310, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
> readable.',
> last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=37ms
> * vm-vdicdb01_monitor_2 on vdicnode02-priv 'not installed' (5):
> call=2320, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not readable.',
> last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=35ms
> * vm-vdicudsserver_monitor_2 on vdicnode02-priv 'not installed'
> (5): call=2321, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicudsserver.xml does not exist or is not
> readable.',
> last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=42ms
> * vm-vdicudstuneler_monitor_2 on vdicnode01-priv 'not installed'
> (5): call=1987183, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicudstuneler.xml does not exist or is not
> readable.',
> last-rc-change='Thu Feb 16 04:00:25 2017', queued=0ms, exec=30ms
> * vm-vdicdb01_monitor_2 on vdicnode01-priv 'not installed' (5):
> call=2550049, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not readable.',
> last-rc-change='Thu Feb 16 08:13:37 2017', queued=0ms, exec=44ms
> * nfs-mon_monitor_1 on vdicnode01-priv 'unknown error' (1):
> call=1984009, status=Timed Out, exitreason='none',
> last-rc-change='Thu Feb 16 04:24:30 2017', queued=0ms, exec=0ms
> * vm-vdicsunstone01_monitor_2 on vdicnode01-priv 'not installed'
> (5): call=2552050, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
> readable.',
> last-rc-change='Thu Feb 16 08:14:07 2017', queued=0ms, exec=22ms
> * vm-vdicone01_monitor_2 on vdicnode01-priv 'not installed' (5):
> call=2620052, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not readable.',
> last-rc-change='Thu Feb 16 09:02:53 2017', queued=0ms, exec=45ms
> * vm-vdicudsserver_monitor_2 on vdicnode01-priv 'not installed'
> (5): call=2550052, status=complete, exitreason='Configuration file
> /mnt/nfs-vdic-mgmt-vm/vdicudsserver.xml does not exist or is not
> readable.',
> last-rc-change='Thu Feb 16 08:13:37 2017', queued=0ms, exec=48ms
>
>
> Al VirtualDomain resources are configured the same:
>
> [root@vdicnode01 cluster]# pcs resource show vm-vdicone01
>  Resource: vm-vdicone01 (class=ocf provider=heartbeat type=VirtualDomain)
>   Attributes: hypervisor=qemu:///system
> config=/mnt/nfs-vdic-mgmt-vm/vdicone01.xml
> migration_network_suffix=tcp:// migration_transport=ssh
>   Meta Attrs: allow-migrate=true target-role=Stopped
>   Utilization: cpu=1 hv_memory=512
>   Operations: 

[ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

2017-02-16 Thread Félix Díaz de Rada


Hi all,

We are currently setting up a MySQL cluster (Master-Slave) over this 
platform:

- Two nodes, on RHEL 7.0
- pacemaker-1.1.10-29.el7.x86_64
- corosync-2.3.3-2.el7.x86_64
- pcs-0.9.115-32.el7.x86_64
There is a IP address resource to be used as a "virtual IP".

This is configuration of cluster:

Cluster Name: webmobbdprep
Corosync Nodes:
 webmob1bdprep-ges webmob2bdprep-ges
Pacemaker Nodes:
 webmob1bdprep-ges webmob2bdprep-ges

Resources:
 Group: G_MySQL_M
  Meta Attrs: priority=100
  Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m)
   Attributes: 
binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe 
config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep 
log=/data/webmob_prep/webmob_prep.err 
pid=/data/webmob_prep/webmob_rep.pid 
socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql 
test_table=replica.pacemaker_test test_user=root

   Meta Attrs: resource-stickiness=1000
   Operations: promote interval=0s timeout=120 
(MySQL_M-promote-timeout-120)

   demote interval=0s timeout=120 (MySQL_M-demote-timeout-120)
   start interval=0s timeout=120s on-fail=restart 
(MySQL_M-start-timeout-120s-on-fail-restart)

   stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s)
   monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 
(MySQL_M-monitor-interval-60s-timeout-30s)

  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32
   Meta Attrs: target-role=Started migration-threshold=3 
failure-timeout=60s

   Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
   stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
   monitor interval=60s (ClusterIP-monitor-interval-60s)
 Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s)
  Attributes: 
binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe 
config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep 
log=/data/webmob_prep/webmob_prep.err 
pid=/data/webmob_prep/webmob_rep.pid 
socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql 
test_table=replica.pacemaker_test test_user=root

  Meta Attrs: resource-stickiness=0
  Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120)
  demote interval=0s timeout=120 (MySQL_S-demote-timeout-120)
  start interval=0s timeout=120s on-fail=restart 
(MySQL_S-start-timeout-120s-on-fail-restart)

  stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s)
  monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1 
(MySQL_S-monitor-interval-60s-timeout-30s)


Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start MySQL_M then start ClusterIP (Mandatory) 
(id:order-MySQL_M-ClusterIP-mandatory)
  start G_MySQL_M then start MySQL_S (Mandatory) 
(id:order-G_MySQL_M-MySQL_S-mandatory)

Colocation Constraints:
  G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY)

Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-29.el7-368c726
 last-lrm-refresh: 1487148812
 no-quorum-policy: ignore
 stonith-enabled: false

Pacemaker works as expected under most of situations, but there is one 
scenario that is really not understable to us. I will try to describe it:


a - Master resource (and Cluster IP address) are active on node 1 and 
Slave resource is active on node 2.

b - We force movement of Master resource to node 2.
c - Pacemaker stops all resources: Master, Slave and Cluster IP.
d - Master resource and Cluster IP are started on node 2 (this is OK), 
but Slave also tries to start (??). It fails (logically, because Master 
resource has been started on the same node), it logs an "unknown error" 
and its state is marked as "failed". This is a capture of 'pcs status' 
at that point:


OFFLINE: [ webmob1bdprep-ges ]
Online: [ webmob2bdprep-ges ]

Full list of resources:

Resource Group: G_MySQL_M
MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges
ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges
MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges

Failed actions:
MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62, 
status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms, 
exec=0ms
MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78, 
status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms, 
exec=0ms


PCSD Status:
webmob1bdprep-ges: Offline
webmob2bdprep-ges: Online

e - Pacemaker moves Slave resource to node 1 and starts it. Now we have 
both resources started again, Master on node 2 and Slave on node 1.

f - One minute later, Pacemaker restarts both resources (???).

So we are wondering:
- After the migration of the Master resource, why Pacemaker tries to 
start Slave resource on the same node where Master resource has been 
started previously? Why is 

[ClusterLabs] Disabled resource is hard logging

2017-02-16 Thread Oscar Segarra
Hi,

In my environment I have deployed 5 VirtualDomains as one can see below:
[root@vdicnode01 ~]# pcs status
Cluster name: vdic-cluster
Stack: corosync
Current DC: vdicnode01-priv (version 1.1.15-11.el7_3.2-e174ec8) - partition
with quorum
Last updated: Thu Feb 16 09:02:53 2017  Last change: Thu Feb 16
08:20:53 2017 by root via crm_attribute on vdicnode02-priv

2 nodes and 14 resources configured: 5 resources DISABLED and 0 BLOCKED
from being started due to failures

Online: [ vdicnode01-priv vdicnode02-priv ]

Full list of resources:

 nfs-vdic-mgmt-vm-vip   (ocf::heartbeat:IPaddr):Started
vdicnode01-priv
 Clone Set: nfs_setup-clone [nfs_setup]
 Started: [ vdicnode01-priv vdicnode02-priv ]
 Clone Set: nfs-mon-clone [nfs-mon]
 Started: [ vdicnode01-priv vdicnode02-priv ]
 Clone Set: nfs-grace-clone [nfs-grace]
 Started: [ vdicnode01-priv vdicnode02-priv ]
 vm-vdicone01   (ocf::heartbeat:VirtualDomain): FAILED (disabled)[
vdicnode02-priv vdicnode01-priv ]
 vm-vdicsunstone01  (ocf::heartbeat:VirtualDomain): FAILED
vdicnode01-priv (disabled)
 vm-vdicdb01(ocf::heartbeat:VirtualDomain): FAILED (disabled)[
vdicnode02-priv vdicnode01-priv ]
 vm-vdicudsserver   (ocf::heartbeat:VirtualDomain): FAILED (disabled)[
vdicnode02-priv vdicnode01-priv ]
 vm-vdicudstuneler  (ocf::heartbeat:VirtualDomain): FAILED
vdicnode01-priv (disabled)
 Clone Set: nfs-vdic-images-vip-clone [nfs-vdic-images-vip]
 Stopped: [ vdicnode01-priv vdicnode02-priv ]

Failed Actions:
* vm-vdicone01_monitor_2 on vdicnode02-priv 'not installed' (5):
call=2322, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=21ms
* vm-vdicsunstone01_monitor_2 on vdicnode02-priv 'not installed' (5):
call=2310, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
readable.',
last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=37ms
* vm-vdicdb01_monitor_2 on vdicnode02-priv 'not installed' (5):
call=2320, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=35ms
* vm-vdicudsserver_monitor_2 on vdicnode02-priv 'not installed' (5):
call=2321, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicudsserver.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 09:02:07 2017', queued=0ms, exec=42ms
* vm-vdicudstuneler_monitor_2 on vdicnode01-priv 'not installed' (5):
call=1987183, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicudstuneler.xml does not exist or is not
readable.',
last-rc-change='Thu Feb 16 04:00:25 2017', queued=0ms, exec=30ms
* vm-vdicdb01_monitor_2 on vdicnode01-priv 'not installed' (5):
call=2550049, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicdb01.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 08:13:37 2017', queued=0ms, exec=44ms
* nfs-mon_monitor_1 on vdicnode01-priv 'unknown error' (1):
call=1984009, status=Timed Out, exitreason='none',
last-rc-change='Thu Feb 16 04:24:30 2017', queued=0ms, exec=0ms
* vm-vdicsunstone01_monitor_2 on vdicnode01-priv 'not installed' (5):
call=2552050, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicsunstone01.xml does not exist or is not
readable.',
last-rc-change='Thu Feb 16 08:14:07 2017', queued=0ms, exec=22ms
* vm-vdicone01_monitor_2 on vdicnode01-priv 'not installed' (5):
call=2620052, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicone01.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 09:02:53 2017', queued=0ms, exec=45ms
* vm-vdicudsserver_monitor_2 on vdicnode01-priv 'not installed' (5):
call=2550052, status=complete, exitreason='Configuration file
/mnt/nfs-vdic-mgmt-vm/vdicudsserver.xml does not exist or is not readable.',
last-rc-change='Thu Feb 16 08:13:37 2017', queued=0ms, exec=48ms


Al VirtualDomain resources are configured the same:

[root@vdicnode01 cluster]# pcs resource show vm-vdicone01
 Resource: vm-vdicone01 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: hypervisor=qemu:///system
config=/mnt/nfs-vdic-mgmt-vm/vdicone01.xml migration_network_suffix=tcp://
migration_transport=ssh
  Meta Attrs: allow-migrate=true target-role=Stopped
  Utilization: cpu=1 hv_memory=512
  Operations: start interval=0s timeout=90 (vm-vdicone01-start-interval-0s)
  stop interval=0s timeout=90 (vm-vdicone01-stop-interval-0s)
  monitor interval=20s role=Stopped
(vm-vdicone01-monitor-interval-20s)
  monitor interval=30s (vm-vdicone01-monitor-interval-30s)
[root@vdicnode01 cluster]# pcs resource show vm-vdicdb01
 Resource: vm-vdicdb01 (class=ocf 

Re: [ClusterLabs] corosync dead loop in segfault handler

2017-02-16 Thread Christine Caulfield
On 16/02/17 03:51, cys wrote:
> At 2017-02-15 23:13:08, "Christine Caulfield"  wrote:
>>
>> Yes, it seems that some corosync SEGVs trigger this obscure bug in
>> libqb. I've chased a few possible causes and none have been fruitful.
>>
>> If you get this then corosync has crashed, and this other bug is masking
>> the actual diagnostics - I know, helpful :/
>>
>> It's on my list
>>
>> Chrissie
>>
> 
> Thanks.
> I think you have noticed that my_service_list[3] is invalid.
> About the SEGV, do you need additional information? coredump or logs?
> 

A blackbox dump and (if possible) coredump would be very useful if you
can get them. thank you.

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org