Re: [ClusterLabs] Cluster node getting stopped from other node(resending mail)

2015-07-01 Thread Ken Gaillot
On 06/30/2015 11:30 PM, Arjun Pandey wrote:
 Hi
 
 I am running a 2 node cluster with this config on centos 6.5/6.6
 
 Master/Slave Set: foo-master [foo]
 Masters: [ messi ]
 Stopped: [ronaldo ]
  eth1-CP(ocf::pw:IPaddr):   Started messi
  eth2-UP(ocf::pw:IPaddr):   Started messi
  eth3-UPCP  (ocf::pw:IPaddr):   Started messi
 
 where i have a multi-state resource foo being run in master/slave mode and
  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
 collocation constraint for the IP addr to be collocated with the master.
 
 Sometimes when i setup the cluster , i find that one of the nodes (the
 second node that joins ) gets stopped and i find this log.
 
 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker
 Cluster Manager
 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice:
 attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice:
 do_state_transition: State transition S_PENDING - S_NOT_DC [
 input=I_NOT_DC cause=C_HA_MESSAG
 E origin=do_cl_join_finalize_respond ]
 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice:
 attrd_local_callback: Sending full refresh (origin=crmd)
 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice:
 attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
  This looks to be the likely
 reason***
 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error:
 handle_request: We didn't ask to be shut down, yet our DC is telling us too
 .
 *

Hi Arjun,

I'd check the other node's logs at this time, to see why it requested
the shutdown.

 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice:
 do_state_transition: State transition S_NOT_DC - S_STOPPING [ input=I_STOP
 cause=C_HA_MESSAGE
  origin=route_message ]
 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice:
 lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...
 waiting (2 ops remaining)
 
 Based on the logs , pacemaker on active was stopping the secondary cloud
 everytime it joins cluster. This issue seems similar to
 http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error
 
 Packages used :-
 pacemaker-1.1.12-4.el6.x86_64
 pacemaker-libs-1.1.12-4.el6.x86_64
 pacemaker-cli-1.1.12-4.el6.x86_64
 pacemaker-cluster-libs-1.1.12-4.el6.x86_64
 pacemaker-debuginfo-1.1.10-14.el6.x86_64
 pcsc-lite-libs-1.5.2-13.el6_4.x86_64
 pcs-0.9.90-2.el6.centos.2.noarch
 pcsc-lite-1.5.2-13.el6_4.x86_64
 pcsc-lite-openct-0.6.19-4.el6.x86_64
 corosync-1.4.1-17.el6.x86_64
 corosynclib-1.4.1-17.el6.x86_64


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource stop when another resource run on that node

2015-07-01 Thread Ken Gaillot
On 07/01/2015 01:18 AM, John Gogu wrote:
 ​Hello,
 this is what i have setup but is now working 100%:
 
 Online: [ node01hb0 node02hb0 ]
 Full list of resources:
  IP1_Vir(ocf::heartbeat:IPaddr):Started node01hb0
  IP2_Vir(ocf::heartbeat:IPaddr):Started node02hb0
 
 
  default-resource-stickiness: 2000
 
 
 ​Location Constraints:
   Resource: IP1_Vir
 Enabled on: node01hb0 (score:1000)
 
   Resource: IP2_Vir
 Disabled on: node01hb0 (score:-INFINITY)
 
 Colocation Constraints:
   IP2_Vir with IP1_Vir (score:-INFINITY)
 
 ​When i move manual the resource ​IP1_Vir from node01hb0  node02hb0 all is
 fine, IP2_Vir is stopped.

That's what you asked it to do. :)

The -INFINITY constraint for IP2_Vir on node01hb0 means that IP2_Vir can
*never* run on that node. The -INFINITY constraint for IP2_Vir with
IP1_Vir means that IP2_Vir can *never* run on the same node as IP1_Vir.
So if IP1_Vir is on node02hb0, then IP2_Vir has nowhere to run.

If you want either node to be able to take over either IP when
necessary, you don't want any -INFINITY constraints. You can use a score
other than -INFINITY to give a preference instead of a requirement.

For example, if you want the IPs to run on different nodes whenever
possible, you could have a colocation constraint IP2_Vir with IP1_Vir
score -3000. Having the score more negative than the resource stickiness
means that when a failed node comes back up, one of the IPs will move to
it. If you don't want that, use a score less than your stickiness, such
as -100.

You probably don't want any location constraints, unless there's a
reason each IP should be on a specific node in normal operation.

 When i crash node node01hb0 / stop pacemaker  both resources are stopped.

This likely depends on your quorum and fencing configuration, and what
versions of software you're using.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 08:57 AM, alex austin wrote:
 I have now configured stonith-enabled=true. What device should I use for
 fencing given the fact that it's a virtual machine but I don't have access
 to its configuration. would fence_pcmk do? if so, what parameters should I
 configure for it to work properly?

No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
CMAN to redirect its fencing requests to pacemaker.

For a virtual machine, ideally you'd use fence_virtd running on the
physical host, but I'm guessing from your comment that you can't do
that. Does whoever provides your VM also provide an API for controlling
it (starting/stopping/rebooting)?

Regarding your original problem, it sounds like the surviving node
doesn't have quorum. What version of corosync are you using? If you're
using corosync 2, you need two_node: 1 in corosync.conf, in addition
to configuring fencing in pacemaker.

 This is my new config:
 
 
 node dcwbpvmuas004.edc.nam.gm.com \
 
 attributes standby=off
 
 node dcwbpvmuas005.edc.nam.gm.com \
 
 attributes standby=off
 
 primitive ClusterIP IPaddr2 \
 
 params ip=198.208.86.242 cidr_netmask=23 \
 
 op monitor interval=1s timeout=20s \
 
 op start interval=0 timeout=20s \
 
 op stop interval=0 timeout=20s \
 
 meta is-managed=true target-role=Started resource-stickiness=500
 
 primitive pcmk-fencing stonith:fence_pcmk \
 
 params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
 dcwbpvmuas005.edc.nam.gm.com \
 
 op monitor interval=10s \
 
 meta target-role=Started
 
 primitive redis redis \
 
 meta target-role=Master is-managed=true \
 
 op monitor interval=1s role=Master timeout=5s on-fail=restart
 
 ms redis_clone redis \
 
 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1
 
 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
 
 colocation ip-on-redis inf: ClusterIP redis_clone:Master
 
 colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
 
 property cib-bootstrap-options: \
 
 dc-version=1.1.11-97629de \
 
 cluster-infrastructure=classic openais (with plugin) \
 
 expected-quorum-votes=2 \
 
 stonith-enabled=true
 
 property redis_replication: \
 
 redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
 
 On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
 alexander.nekra...@emc.com wrote:
 
 stonith-enabled=false

 this might be the issue. The way peer node death is resolved, the
 surviving node must call STONITH on the peer. If it’s disabled it might not
 be able to resolve the event



 Alex



 *From:* alex austin [mailto:alexixa...@gmail.com]
 *Sent:* Wednesday, July 01, 2015 9:51 AM
 *To:* Users@clusterlabs.org
 *Subject:* Re: [ClusterLabs] Pacemaker failover failure



 So I noticed that if I kill redis on one node, it starts on the other, no
 problem, but if I actually kill pacemaker itself on one node, the other
 doesn't sense it so it doesn't fail over.







 On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com wrote:

 Hi all,



 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis gets
 promoted on the other node. However if pacemaker itself fails on the active
 node, the failover is not performed. Is there anything I missed in the
 configuration?



 Here's my configuration (i have hashed the ip address out):



 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false

 property redis_replication: \

 redis_REPL_INFO=host.com



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 09:39 AM, alex austin wrote:
 This is what crm_mon shows
 
 
 Last updated: Wed Jul  1 10:35:40 2015
 
 Last change: Wed Jul  1 09:52:46 2015
 
 Stack: classic openais (with plugin)
 
 Current DC: host2 - partition with quorum
 
 Version: 1.1.11-97629de
 
 2 Nodes configured, 2 expected votes
 
 4 Resources configured
 
 
 
 Online: [ host1 host2 ]
 
 
 ClusterIP (ocf::heartbeat:IPaddr2): Started host2
 
  Master/Slave Set: redis_clone [redis]
 
  Masters: [ host2 ]
 
  Slaves: [ host1 ]
 
 pcmk-fencing(stonith:fence_pcmk):   Started host2
 
 On Wed, Jul 1, 2015 at 3:37 PM, alex austin alexixa...@gmail.com wrote:
 
 I am running version 1.4.7 of corosync

If you can't upgrade to corosync 2 (which has many improvements), you'll
need to set the no-quorum-policy=ignore cluster option.

Proper fencing is necessary to avoid a split-brain situation, which can
corrupt your data.

 On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot kgail...@redhat.com wrote:

 On 07/01/2015 08:57 AM, alex austin wrote:
 I have now configured stonith-enabled=true. What device should I use for
 fencing given the fact that it's a virtual machine but I don't have
 access
 to its configuration. would fence_pcmk do? if so, what parameters
 should I
 configure for it to work properly?

 No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
 CMAN to redirect its fencing requests to pacemaker.

 For a virtual machine, ideally you'd use fence_virtd running on the
 physical host, but I'm guessing from your comment that you can't do
 that. Does whoever provides your VM also provide an API for controlling
 it (starting/stopping/rebooting)?

 Regarding your original problem, it sounds like the surviving node
 doesn't have quorum. What version of corosync are you using? If you're
 using corosync 2, you need two_node: 1 in corosync.conf, in addition
 to configuring fencing in pacemaker.

 This is my new config:


 node dcwbpvmuas004.edc.nam.gm.com \

 attributes standby=off

 node dcwbpvmuas005.edc.nam.gm.com \

 attributes standby=off

 primitive ClusterIP IPaddr2 \

 params ip=198.208.86.242 cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive pcmk-fencing stonith:fence_pcmk \

 params pcmk_host_list=dcwbpvmuas004.edc.nam.gm.com
 dcwbpvmuas005.edc.nam.gm.com \

 op monitor interval=10s \

 meta target-role=Started

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta notify=true is-managed=true ordered=false interleave=false
 globally-unique=false target-role=Master migration-threshold=1

 colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

 colocation ip-on-redis inf: ClusterIP redis_clone:Master

 colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master

 property cib-bootstrap-options: \

 dc-version=1.1.11-97629de \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=true

 property redis_replication: \

 redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com

 On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander 
 alexander.nekra...@emc.com wrote:

 stonith-enabled=false

 this might be the issue. The way peer node death is resolved, the
 surviving node must call STONITH on the peer. If it’s disabled it
 might not
 be able to resolve the event



 Alex



 *From:* alex austin [mailto:alexixa...@gmail.com]
 *Sent:* Wednesday, July 01, 2015 9:51 AM
 *To:* Users@clusterlabs.org
 *Subject:* Re: [ClusterLabs] Pacemaker failover failure



 So I noticed that if I kill redis on one node, it starts on the other,
 no
 problem, but if I actually kill pacemaker itself on one node, the other
 doesn't sense it so it doesn't fail over.







 On Wed, Jul 1, 2015 at 12:42 PM, alex austin alexixa...@gmail.com
 wrote:

 Hi all,



 I have configured a virtual ip and redis in master-slave with corosync
 pacemaker. If redis fails, then the failover is successful, and redis
 gets
 promoted on the other node. However if pacemaker itself fails on the
 active
 node, the failover is not performed. Is there anything I missed in the
 configuration?



 Here's my configuration (i have hashed the ip address out):



 node host1.com

 node host2.com

 primitive ClusterIP IPaddr2 \

 params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

 op monitor interval=1s timeout=20s \

 op start interval=0 timeout=20s \

 op stop interval=0 timeout=20s \

 meta is-managed=true target-role=Started resource-stickiness=500

 primitive redis redis \

 meta target-role=Master is-managed=true \

 op monitor interval=1s role=Master timeout=5s on-fail=restart

 ms redis_clone redis \

 meta

Re: [ClusterLabs] cib state is now lost

2015-08-12 Thread Ken Gaillot
On 08/12/2015 05:29 AM, David Neudorfer wrote:
 Thanks Ken,
 
 We're currently using Pacemaker 1.1.11 and at the moment its not an option
 to upgrade.
 I've spun up and down these boxes on AWS and even tried different sizes. I
 think a recent upgrade broke this deploy.

What OS distribution/version are you using?

If you have the option of switching from corosync 1+plugin to either
corosync 1+CMAN or corosync 2, that should avoid the issue, and put you
in a better supported position going forward. The plugin code has known
memory issues when nodes come and go, and the effects can be unpredictable.

 This is the output from dmesg:
 
 cib[16656] general protection ip:7f45391e9545 sp:7ffddf16c8b8 error:0 in
 libc-2.12.so[7f45390be000+18a000]
 cib[16659] general protection ip:7fa36fa89545 sp:7ffe28416288 error:0 in
 libc-2.12.so[7fa36f95e000+18a000]
 cib[16663] general protection ip:7fa3defce545 sp:7ffeb5b29c58 error:0 in
 libc-2.12.so[7fa3deea3000+18a000]
 cib[1] general protection ip:7fa1cefe4545 sp:7ffcc4b9c778 error:0 in
 libc-2.12.so[7fa1ceeb9000+18a000]
 cib[16669] general protection ip:7f4b3900f545 sp:7ffdcd65aaf8 error:0 in
 libc-2.12.so[7f4b38ee4000+18a000]
 cib[16672] general protection ip:7fc38be2b545 sp:7fffbc7e1598 error:0 in
 libc-2.12.so[7fc38bd0+18a000]
 cib[16675] general protection ip:7f9c6890c545 sp:7ffca09539f8 error:0 in
 libc-2.12.so[7f9c687e1000+18a000]
 cib[16678] general protection ip:7f1c636ad545 sp:7ffc677d2008 error:0 in
 libc-2.12.so[7f1c63582000+18a000]
 cib[16681] general protection ip:7fed0b47e545 sp:7ffd051f0618 error:0 in
 libc-2.12.so[7fed0b353000+18a000]
 cib[16684] general protection ip:7f2ee87cd545 sp:7fff8d9ae288 error:0 in
 libc-2.12.so[7f2ee86a2000+18a000]
 cib[16687] general protection ip:7f41c3789545 sp:7fff9f005848 error:0 in
 libc-2.12.so[7f41c365e000+18a000]
 
 
 
 On Mon, Aug 10, 2015 at 9:54 AM, Ken Gaillot kgail...@redhat.com wrote:
 
 On 08/09/2015 02:27 PM, David Neudorfer wrote:
 Where can I dig deeper to figure out why cib keeps terminating? selinux
 and
 iptables are both disabled and I've have debug enabled. Google hasn't
 been
 able to help me thus far.

 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:debug:
 get_local_nodeid: Local nodeid is 84939948
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 plugin_get_details:   Server details: id=84939948 uname=ip-172-20-16-5
 cname=pcmk
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_get_peer: Created entry
 c1f204b2-c994-48d9-81b6-87e1a7fc1ee7/0xa2c460 for node
 ip-172-20-16-5/84939948 (1 total)
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_get_peer: Node 84939948 is now known as ip-172-20-16-5
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_get_peer: Node 84939948 has uuid ip-172-20-16-5
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_update_peer_proc: init_cs_connection_classic: Node
 ip-172-20-16-5[84939948] - unknown is now online
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 init_cs_connection_once:  Connection to 'classic openais (with
 plugin)': established
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 get_node_name:Defaulting to uname -n for the local classic
 openais
 (with plugin) node name
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 qb_ipcs_us_publish:   server name: cib_ro
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 qb_ipcs_us_publish:   server name: cib_rw
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 qb_ipcs_us_publish:   server name: cib_shm
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info: cib_init:
   Starting cib mainloop
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 plugin_handle_membership: Membership 104: quorum acquired
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_update_peer_proc: plugin_handle_membership: Node
 ip-172-20-16-5[84939948] - unknown is now member
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 crm_update_peer_state:cib_peer_update_callback: Node
 ip-172-20-16-5[84939948] - state is now lost (was (null))
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 crm_reap_dead_member: Removing ip-172-20-16-5/84939948 from the
 membership list
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 reap_crm_member:  Purged 1 peers with id=84939948 and/or uname=(null)
 from the membership cache
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib:   notice:
 crm_update_peer_state:plugin_handle_membership: Node
 ��[2077843320]
 - state is now member (was member)
 Aug 09 18:54:29 [12526] ip-172-20-16-5cib: info:
 crm_update_peer:  plugin_handle_membership: Node ��: id=2077843320
 state=r(0) ip(172.20.16.5)  addr=r(0) ip(172.20.16.5)  (new) votes=1
 (new) born=104 seen=104 proc

Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-24 Thread Ken Gaillot
On 08/24/2015 04:52 AM, Andrei Borzenkov wrote:
 24.08.2015 12:35, Tom Yates пишет:
 I've got a failover firewall pair where the external interface is ADSL;
 that is, PPPoE.  i've defined the service thus:

 primitive ExternalIP lsb:hb-adsl-helper \
  op monitor interval=60s

 and in addition written a noddy script /etc/init.d/hb-adsl-helper, thus:

 #!/bin/bash
 RETVAL=0
 start() {
  /sbin/pppoe-start
 }
 stop() {
  /sbin/pppoe-stop
 }
 case $1 in
start)
  start
  ;;
stop)
  stop
  ;;
status)
  /sbin/ifconfig ppp0  /dev/null  exit 0
  exit 1
  ;;
*)
  echo $Usage: $0 {start|stop|status}
  exit 3
 esac
 exit $?

Pacemaker expects that LSB agents follow the LSB spec for return codes,
and won't be able to behave properly if they don't:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-lsb


However it's just as easy to write an OCF agent, which gives you more
flexibility (accepting parameters, etc.):

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

 The problem is that sometimes the ADSL connection falls over, as they
 do, eg:

 Aug 20 11:42:10 positron pppd[2469]: LCP terminated by peer
 Aug 20 11:42:10 positron pppd[2469]: Connect time 8619.4 minutes.
 Aug 20 11:42:10 positron pppd[2469]: Sent 1342528799 bytes, received
 164420300 bytes.
 Aug 20 11:42:13 positron pppd[2469]: Connection terminated.
 Aug 20 11:42:13 positron pppd[2469]: Modem hangup
 Aug 20 11:42:13 positron pppoe[2470]: read (asyncReadFromPPP): Session
 1735: Input/output error
 Aug 20 11:42:13 positron pppoe[2470]: Sent PADT
 Aug 20 11:42:13 positron pppd[2469]: Exit.
 Aug 20 11:42:13 positron pppoe-connect: PPPoE connection lost;
 attempting re-connection.

 CRMd then logs a bunch of stuff, followed by

 Aug 20 11:42:18 positron lrmd: [1760]: info: rsc:ExternalIP:8: stop
 Aug 20 11:42:18 positron lrmd: [28357]: WARN: For LSB init script, no
 additional parameters are needed.
 [...]
 Aug 20 11:42:18 positron pppoe-stop: Killing pppd
 Aug 20 11:42:18 positron pppoe-stop: Killing pppoe-connect
 Aug 20 11:42:18 positron lrmd: [1760]: WARN: Managed ExternalIP:stop
 process 28357 exited with return code 1.


 At this point, the PPPoE connection is down, and stays down.  CRMd
 doesn't fail the group which contains both internal and external
 interfaces over to the other node, but nor does it try to restart the
 service.  I'm fairly sure this is because I've done something
 boneheaded, but I can't get my bone head around what it might be.

 Any light anyone can shed is much appreciated.


 
 If stop operation failed resource state is undefined; pacemaker won't do
 anything with this resource. Either make sure script returns success
 when appropriate or the only option is to make it fence node where
 resource was active.
 
 
 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] starting of resources

2015-08-11 Thread Ken Gaillot
On 08/11/2015 02:12 AM, Vijay Partha wrote:
 After you start pacemaker and then type pcs status, we get the output that
 there are nodes online and the list of resources are empty. We then add
 resources to the nodes. Now what i want is after starting pacemaker can i
 get some resources to be started without adding the resources by making use
 of pcs. If there are archives for this list could you help me out in
 sending the link.

You only need to add resources once. pcs status takes a little time to
show them when a cluster first starts up; just wait a while and type
pcs status again. The resources themselves will be started as soon as
the cluster determines they safely can be.

 On Tue, Aug 11, 2015 at 12:39 PM, Andrei Borzenkov arvidj...@gmail.com
 wrote:
 
 On Tue, Aug 11, 2015 at 9:44 AM, Vijay Partha vijaysarath...@gmail.com
 wrote:
 Hi,

 Can we statically add resources to the nodes. I mean before the
 pacemaker is
 started can we add resources to the nodes like you dont require to make
 use
 of pcs resource create. Is this possible?


 You better explain what you are trying to achieve. Otherwise exactly
 this question was discussed just recently, search archives of this
 list.



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Delayed first monitoring

2015-08-12 Thread Ken Gaillot
On 08/12/2015 10:45 AM, Miloš Kozák wrote:
 Thank you for your answer, but.
 
 1) This sounds ok, but in other words it means the first delayed check
 is not possible to be done.
 
 2) Start of init script? I follow lsb scripts from distribution, so
 there is not way to change them (I can change them, but with packages
 upgade they will go void). The is quite typical approach, how can I do
 HA for atlassian for example? Jira loads 5minutes..

I think your situation involves multiple issues which are worth
separating for clarity:

1. As Alexander mentioned, Pacemaker will do a monitor BEFORE trying to
start a service, to make sure it's not already running. So these don't
need any delay and are expected to fail.

2. Resource agents MUST NOT return success for start until the service
is fully up and running, so the next monitor should succeed, again
without needing any delay. If that's not the case, it's a bug in the agent.

3. It's generally better to use OCF resource agents whenever available,
as they have better integration with pacemaker than lsb/systemd/upstart.
In this case, take a look at ocf:heartbeat:apache.

4. You can configure the timeout used with each action (stop, start,
monitor, restart) on a given resource. The default is 20 seconds. For
example, if a start action is expected to take 5 minutes, you would
define a start operation on the resource with timeout=300s. How you do
that depends on your management tool (pcs, crmsh, or cibadmin).

Bottom line, you should never need a delay on the monitor, instead set
appropriate timeouts for each action, and make sure that the agent does
not return from start until the service is fully up.

 Dne 12.8.2015 v 16:14 Nekrasov, Alexander napsal(a):
 1. Pacemaker will/may call a monitor before starting a resource, in
 which case it expects a NOT_RUNNING response. It's just checking
 assumptions at that point.

 2. A resource::start must only return when resource::monitor is
 successful. Basically the logic of a start() must follow this:

 start() {
start_daemon()
while ! monitor() ; do
sleep some
done
return $OCF_SUCCESS
 }

 -Original Message-
 From: Miloš Kozák [mailto:milos.ko...@lejmr.com]
 Sent: Wednesday, August 12, 2015 10:03 AM
 To: users@clusterlabs.org
 Subject: [ClusterLabs] Delayed first monitoring

 Hi,

 I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to
 provide high-availability of opennebula. However, I am facing to a
 strange problem which raises from my lack of knowleadge..

 In the log I can see that when I create a resource based on an init
 script, typically:

 pcs resource create httpd lsb:httpd

 The httpd daemon gets started, but monitor is initiated at the same time
 and the resource is identified as not running. This behaviour makes
 sense since we realize that the daemon starting takes some time. In this
 particular case, I get error code 2 which means that process is running,
 but environment is not locked. The effect of this is that httpd resource
 gets restarted.

 My workaround is extra sleep in status function of the init script, but
 I dont like this solution at all! Do you have idea how to tackle this
 problem in a proper way? I expected an op attribut which would specify
 delay after service start and first monitoring, but I could not find
 it..

 Thank you, Milos


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] apache services

2015-08-05 Thread Ken Gaillot
On 08/05/2015 04:05 AM, Vijay Partha wrote:
 Hi,
 
 I need to run apache service on both the nodes in a cluster. httpd is
 listening in port 80 on first node and httpd is listening to port 81 on the
 second. I am not able to add these instances separately rather both of them
 are starting on the same node1. even if i move the service i get an error
 WebSite1_start_0 on node2 'unknown error' (1): call=27, status=complete,
 last-rc-change='Wed Aug  5 11:02:47 2015', queued=1ms, exec=3146ms.
 
 Please help me out.

You have two separate issues:

1. Both instances are starting on the same node; and

2. Moving an instance produces an error.

For #1, the answer is colocation constraints (which are distinct from
location constraints and ordering constraints). Colocation constraints
say that two resources should be kept together (if the score is
positive) or kept apart (if the score is negative).

For #2, pacemaker is asking the resource agent to perform an action, and
the resource agent is saying it can't. Look at the logs to try to find
the error reported by the resource agent. You can also try running the
resource agent manually.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Running pacemaker 1.1.13 with legacy plugin or heartbeat

2015-08-05 Thread Ken Gaillot
FYI to anyone running the legacy plugin or heartbeat as pacemaker's
communication layer:

Use-after-free memory issues can cause segfault crashes in the cib when
using pacemaker 1.1.13 with the legacy plugin. Heartbeat is likely to be
affected as well.

Clusters using CMAN or corosync 2 as the communication layer are not
affected.

if switching to CMAN or corosync 2 isn't an option for you, I strongly
recommend using a vendor that supports your communication layer, as they
are more likely to do thorough testing and provide fixes.

If anyone wants a targeted patch, I can provide one, but I would
recommend instead using the upstream master branch as of at least commit
0f8059e. That branch includes an overhaul of the affected code area, as
well as other bug fixes.
-- 
Ken Gaillot kgail...@redhat.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync GitHub vs. dev list

2015-08-25 Thread Ken Gaillot
On 08/25/2015 05:20 AM, Ferenc Wagner wrote:
 Hi,
 
 Since Corosync is hosted on GitHub, I wonder if it's enough to submit
 pull requests/issues/patch comments there to get the developers'
 attention, or should I also post to develop...@clusterlabs.org?

GitHub is good for patches, and when you want to reach just the corosync
developers. They'll get the usual github notifications.

The list is good for discussion, and reaches a broader audience
(developers of other cluster components, and advanced users who write
code for their clusters).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster monitoring

2015-10-21 Thread Ken Gaillot
On 10/21/2015 08:24 AM, Michael Schwartzkopff wrote:
> Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey:
>> Hi folks
>> 
>> I had a question on monitoring of cluster events. Based on the 
>> documentation it seems that cluster monitor is the only method
>> of monitoring the cluster events. Also since it seems to poll
>> based on the interval configured it might miss some events. Is
>> that the case ?
> 
> No. the cluser is event-based. So it won't miss any event. If you
> use the cluster's tools, they see hte events. If you monitor the
> events you won't miss any either.

FYI, Pacemaker 1.1.14 will have built-in handling of notification
scripts, without needing a ClusterMon resource. These will be
event-driven. Andrew Beekhof did a recent blog post about it:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Pacemaker's monitors are polling, at the interval specified when
configuring the monitor operation. Pacemaker relies on the resource
agent to return status for monitors, so technically it's up to the
resource agent whether it can "miss" brief outages that occur between
polls. All the ones I've looked at would miss them, but generally
that's considered acceptable if the service is once again fully
working when the monitor runs (because it implies it recovered itself).

Some people use an external monitoring system (nagios, icinga, zabbix,
etc.) in addition to Pacemaker's monitors. They can complement each
other, as the external system can check system parameters outside
Pacemaker's view and can alert administrators for some early warning
signs before a resource gets to the point of needing recovery. Of
course such monitoring systems are also polling at configured intervals.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] difference between OCF return codes for monitor action

2015-10-21 Thread Ken Gaillot
On 10/21/2015 07:44 AM, Kostiantyn Ponomarenko wrote:
> Hi,
> 
> What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING"
> return codes in "monitor" action from the Pacemaker's point of view?
> 
> I was looking here
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
> , but I still don't see the difference clearly.
> 
> Thank you,
> Kostya

OCF_ERR_GENERIC is a "soft" error, so if any operation returns that,
Pacemaker will try to recover the resource by restarting it or moving it
to a new location.

OCF_NOT_RUNNING is a state (not necessarily an error). When first
placing a resource, Pacemaker will (by default) run monitors for it on
all hosts, to make sure it's not already running somewhere. So in that
case (which is usually where you see this), it's not an error, but a
confirmation of the expected state. On the other hand, if Pacemaker gets
this when a resource is expected to be up, it will consider it an error
and try to recover. The only difference in that case is that Pacemaker
will not try to stop the resource because it's already stopped.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VIP monitoring failing with Timed Out error

2015-10-28 Thread Ken Gaillot
On 10/28/2015 03:51 AM, Pritam Kharat wrote:
> Hi All,
> 
> I am facing one issue in my two node HA. When I stop pacemaker on ACTIVE
> node, it takes more time to stop and by this time VIP migration with other
> resources migration fails to STANDBY node. (I have seen same issue in
> ACTIVE node reboot case also)

I assume STANDBY in this case is just a description of the node's
purpose, and does not mean that you placed the node in pacemaker's
standby mode. If the node really is in standby mode, it can't run any
resources.

> Last change: Wed Oct 28 02:52:57 2015 via cibadmin on node-1
> Stack: corosync
> Current DC: node-1 (1) - partition with quorum
> Version: 1.1.10-42f2063
> 2 Nodes configured
> 2 Resources configured
> 
> 
> Online: [ node-1 node-2 ]
> 
> Full list of resources:
> 
>  resource (upstart:resource): Stopped
>  vip (ocf::heartbeat:IPaddr2): Started node-2 (unmanaged) FAILED
> 
> Migration summary:
> * Node node-1:
> * Node node-2:
> 
> Failed actions:
> vip_stop_0 (node=node-2, call=-1, rc=1, status=Timed Out,
> last-rc-change=Wed Oct 28 03:05:24 2015
> , queued=0ms, exec=0ms
> ): unknown error
> 
> VIP monitor is failing over here with error Timed Out. What is the general
> reason for TimeOut. ? I have kept default-action-timeout=180secs which
> should be enough for monitoring

180s should be far more than enough, so something must be going wrong.
Notice that it is the stop operation on the active node that is failing.
Normally in such a case, pacemaker would fence that node to be sure that
it is safe to bring it up elsewhere, but you have disabled stonith.

Fencing is important in failure recovery such as this, so it would be a
good idea to try to get it implemented.

> I have added order property -> when vip is started then only start other
> resources.
> Any clue to solve this problem ? Most of the time this VIP monitoring is
> failing with Timed Out error.

The "stop" in "vip_stop_0" means that the stop operation is what failed.
Have you seen timeouts on any other operations?

Look through the logs around the time of the failure, and try to see if
there are any indications as to why the stop failed.

If you can set aside some time for testing or have a test cluster that
exhibits the same issue, you can try unmanaging the resource in
pacemaker, then:

1. Try adding/removing the IP via normal system commands, and make sure
that works.

2. Try running the resource agent manually (with any verbose option) to
start/stop/monitor the IP to see if you can reproduce the problem and
get more messages.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-09 Thread Ken Gaillot
On 11/09/2015 07:11 AM, Karthikeyan Ramasamy wrote:
> Hi Ken,
>   The script now exits properly with 'exit 0'.  But it still it creates 
> hanging processes, as listed below.
> 
> root 13405 1  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13566 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13623 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13758 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13784 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14146 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14167 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14193 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14284 13758  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14381 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14469 14284  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14589 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14837 14381  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14860 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14977 14589  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 19816 14167  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 19845 19816  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> 
> From the above it looks that one crm_mon spawns another crm_mon processes and 
> keeps building.
> 
> Can you please let us know if there is anything else we have to check or 
> still there could be issues with the script?
> 
> Thanks,
> Karthik.

That's odd. The ClusterMon resource should spawn crm_mon only once, when
it starts. Does the cluster report any failures for the ClusterMon resource?

I doubt it is the issue in this case, but ClusterMon resources should
not be cloned or duplicated, because it does not monitor the health of
one node but of the entire cluster.

> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 02 நவம்பர் 2015 21:21
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 11/02/2015 09:39 AM, Kar

Re: [ClusterLabs] Loadbalancing using Pacemaker

2015-11-09 Thread Ken Gaillot
On 11/08/2015 11:29 AM, didier tanti wrote:
> Thank You Michael,
> In fact I spent some more time looking at documentions and indeed Pacemaker 
> is only used for resource control and management. To have my HA solution I 
> will need to use Corosync directly as well. The OpenAIS API is pretty well 
> described and I am starting to understand what must be done (basically link 
> my binaries with corosync and use the messages and other APIs to have a 
> accurate states of remote objects/services). 
> As for the Virtual IP I believe it makes more sense to use it in case of 
> Active/Standby services. In my case B services being both active i would need 
> to implement the load balancing within service A (using openAIS/Corosync API 
> to be updated of service B state changes and how to reach the service B I 
> have elected through round robin). For those specific components I don't 
> foreseen the need of Virtual IP. However I may use VIP for my service A and 
> other components!

Hi,

You may also want to look at Pacemaker's notification capability to call
an external script for node/resource changes. The latest upstream code
(which will be part of 1.1.14) has some new built-in notification
functionality, which is event-driven.

For prior versions, you can use the ClusterMon resource with -E in the
extra_options to achieve the same effect, but that is polling at regular
intervals rather than truly event-driven:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928

> Thanks, 
> 
> 
>  Le Dimanche 8 novembre 2015 16h24, Michael Schwartzkopff  
> a écrit :
>
> 
>  Am Samstag, 7. November 2015, 09:40:47 schrieb didier tanti:
>> Hello, i am new to Pacemaker and have a question concerning how to have my
>> cluster services aware of the state and location of the other services in
>> the cluster.  Example:
>> Service A is running on Host XService B1 is running on Host XService B2 is
>> running on Host Y Which API would allow my Service A to send IPC messages
>> to services B1 and B2 in a round robin manner?(for example how Service A
>> would be aware of which B is up and active (B1, B2 or both), and how A
>> would even be able to know on which host B1 or B2 is running?) It looks
>> very basic but i cannot find information on this on clusterlabs.org Is
>> there basic tutorial that would explain how to achieve this ? (I guess i
>> would need to link my service binaries with some pacemaker /corosync libs
>> and use some API ?) Thanks for helping out,
> 
> Hi,
> 
> this task is beyond the ability of pacemaker. Your application has to know 
> how 
> to handle that.
> 
> Best solution would be to use virtual IP addresses for services B1 and B2. 
> make sure that the IP addresses run together with the services. Now you 
> service A only has to talk to the IP addresses, no matter on which host they 
> run.
> 
> pacemaker could take care that they run on different hosts is possible.
> 
> Mit freundlichen Grüßen,
> 
> Michael Schwartzkopff


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] move service basing on both connection status and hostname

2015-11-09 Thread Ken Gaillot
On 11/09/2015 10:02 AM, Stefano Sasso wrote:
> Hi Guys,
>   I am having some troubles with the location constraint.
> 
> In particular, what I want to achieve, is to run my service on a host; if
> the ip interconnection fails I want to migrate it to another host, but on
> IP connectivity restoration the resource should move again on the primary
> node.
> 
> So, I have this configuration:
> 
> primitive vfy_ether ocf:pacemaker:l2check \
>> params nic_list="eth1 eth2" debug="false" dampen="1s" \
>> op monitor interval="2s"
>> clone ck_ether vfy_ether
>> location cli-ethercheck MCluster \
>> rule $id="cli-prefer-rule-ethercheck" -inf: not_defined l2ckd or
>> l2ckd lt 2
>> location cli-prefer-masterIP MCluster \
>> rule $id="cli-prefer-rule-masterIP" 50: #uname eq GHA-MO-1
> 
> 
> when the connectivity fails on the primary node, the resource is correctly
> moved to the secondary one.
> But, on IP connectivity restoration, the resource stays on the secondary
> node (and does not move to the primary one).
> 
> How can I solve that?
> Any hint? :-)
> 
> thanks,
>   stefano

Mostly likely, you have a default resource-stickiness set. That tells
Pacemaker to keep services where they are if possible. You can either
delete the stickiness setting or make sure it has a lower score than
your location preference.

Alternatively, are you sure l2ckd is <2 after connectivity is restored?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Howto use ocf:heartbeat:nginx check level > 0

2015-11-09 Thread Ken Gaillot
On 11/08/2015 04:46 AM, user.clusterlabs@siimnet.dk wrote:
> 
>> On 8. nov. 2015, at 10.26, user.clusterlabs@siimnet.dk wrote:
>>
>> Setting up my first pacemaker cluster, I’m trying to grasp howto make 
>> ocf:heartbeat:nginx monitor with check levels > 0.
>>
>> Got this so far:
>>
>> [root@afnA ~]# pcs resource
>>  Resource Group: afnGroup
>>  afnVIP (ocf::heartbeat:IPaddr2):   Started afnA 
>>  afnNGinx   (ocf::heartbeat:nginx): Started afnA 
>>
>> [root@afnA ~]# pcs resource show afnNGinx
>>  Resource: afnNGinx (class=ocf provider=heartbeat type=nginx)
>>   Attributes: configfile=/opt/imail/nginx/conf/nginx.conf port=8080 
>> httpd=/opt/imail/nginx/sbin/nginx options="-p /opt/imail/nginx" 
>> status10url=/ping status10regex=".+ is alive\." 
>>   Operations: start interval=0s timeout=60s (afnNGinx-start-interval-0s)
>>   stop interval=0s timeout=60s (afnNGinx-stop-interval-0s)
>>   monitor interval=10s timeout=20s 
>> (afnNGinx-monitor-interval-10s)
>>   monitor interval=60s timeout=20s 
>> (afnNGinx-monitor-interval-60s)
>> [root@afnA ~]# 
>>
>> but I cant verify that pacemaker RA ever calls http://localhost:8080/ping 
>> , why not?
>>
>> Any pointers to info source(s) for better understanding RA configuration and 
>> maybe specially check levels?
> 
> Found this: 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-operation-monitor-multiple.html
>  
> 
> 
> This seemed to work much better:
> 
> [root@afnA ~]# pcs resource show afnNGinx
>  Resource: afnNGinx (class=ocf provider=heartbeat type=nginx)
>   Attributes: configfile=/opt/imail/nginx/conf/nginx.conf port=8080 
> httpd=/opt/imail/nginx/sbin/nginx options="-p /opt/imail/nginx" 
> status10url=http://localhost:8080/ping status10regex="mss[0-9] is alive\." 
>   Meta Attrs: target-role=Started 
>   Operations: start interval=0s timeout=60s (afnNGinx-start-interval-0s)
>   stop interval=0s timeout=60s (afnNGinx-stop-interval-0s)
>   monitor interval=10s timeout=10s (afnNGinx-monitor-interval-10s)
>   monitor interval=120s timeout=30s OCF_CHECK_LEVEL=10 
> (afnNGinx-monitor-interval-120s)
> 
> =>
> 
> 127.0.0.1 - - [08/Nov/2015:11:34:25 +0100] "GET /ping HTTP/1.1" 200 16 "-" 
> "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC 
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
> 127.0.0.1 - - [08/Nov/2015:11:36:25 +0100] "GET /ping HTTP/1.1" 200 16 "-" 
> "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC 
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
> 127.0.0.1 - - [08/Nov/2015:11:38:25 +0100] "GET /ping HTTP/1.1" 200 16 "-" 
> "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC 
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
> 127.0.0.1 - - [08/Nov/2015:11:40:25 +0100] "GET /ping HTTP/1.1" 200 16 "-" 
> "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC 
> zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
> 
> 
> [root@afnA]# pcs --version
> 0.9.139
> 
> https://www.mankier.com/8/pcs  seems to 
> indicate a debug-monitor command only my pcs version doesn’t seem to support 
> this, might it only be in a later version, also I can seem to find ocf-tester 
> from CentOS 6 repository, where might I find ocf-tester rpm?
> 
> /Steffen

Support for debug-monitor was added to pcs upstream in June of this
year; not sure what version that corresponds to.

ocf-tester is part of the upstream resource-agents package
(https://github.com/ClusterLabs/resource-agents). As I understand it, it
is not included in the RHEL/CentOS releases because it has SuSE-specific
code. It could be made more portable but it's not a high priority.
Patches welcome of course :)


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script
doesn't exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes
away, then it's in the script somewhere. Mostly you want to make sure
the script completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -----Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or 
>> an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including 
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat (and many 
>> other
>> vendors) will support a cluster. Also, you should be able to "yum 
>> update" to a much newer version of Pacemaker to get bugfixes, if 
>> you're using RHEL 6 or 7.
>>
>> FYI, the latest upstream Pacemaker has a new feature that will be in 
>> 1.1.14, allowing it to call an external notification script without 
>> needing a ClusterMon resource.
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
> Thanks, Mr.Gaillot.
> 
> Yes, we trigger the snmp notification with an external script.  From your 
> response, I understand that the issue wouldn't occur with 1.1.14, as it 
> wouldn't require the crm_mon process.  Is this understanding correct?

Correct, crm_mon is not spawned with the new method. However if the
problem originates in the external script, then it still might not work
properly, but with the new method Pacemaker will kill it after a timeout.

> We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
> ticket to RedHat, would they be able to provide us a patch for 1.1.10?

If you're using RHEL 6.7, you should be able to simply "yum update" to
get 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
That would give you more bugfixes, which may or may not help your issue.
If you're using an older version, there may not be updates any longer.

If you open a ticket, support can help you isolate where the problem is.

When you saw many crm_mon processes running, did you also see many
copies of the external script running?

> Many thanks for your response.
> 
> Thanks,
> Karthik.
> 
> 
>  Ken Gaillot எழுதியது 
> 
> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>> Dear Pacemaker support,
>> We are using pacemaker1.1.10-14 to implement a service management framework, 
>> with high availability on the road-map.  This pacemaker version was 
>> available through redhat for our environments
>>
>>   We are running into an issue where pacemaker causes a node to crash.  The 
>> last feature we integrated was SNMP notification.  While listing out the 
>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>> when the node crashed.  When we removed that feature, pacemaker was stable 
>> again.
>>
>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>> for list of known issues, we found this crm_mon memory leak issue.  Although 
>> not related, we think that there is some problem with the crm_mon process.
>>
>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>
>> Can you please let us know if there are issues with SNMP notification in 
>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>> workarounds for this issue if available, would be very helpful for us.  
>> Please help.
>>
>> Thanks,
>> Karthik.
> 
> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
> an external script that generates the SNMP trap?
> 
> If you're using the built-in capability, that has to be explicitly
> enabled when Pacemaker is compiled. Many distributions (including RHEL)
> do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
> have it enabled, otherwise not.
> 
> If you're using an external script to generate the SNMP trap, please
> post it (with any sensitive info taken out of course).
> 
> The ClusterMon resource will generate a crm_mon at regular intervals,
> but it should exit quickly. It sounds like it's not exiting at all,
> which is why you see this problem.
> 
> If you have a RHEL subscription, you can open a support ticket with Red
> Hat. Note that stonith must be enabled before Red Hat (and many other
> vendors) will support a cluster. Also, you should be able to "yum
> update" to a much newer version of Pacemaker to get bugfixes, if you're
> using RHEL 6 or 7.
> 
> FYI, the latest upstream Pacemaker has a new feature that will be in
> 1.1.14, allowing it to call an external notification script without
> needing a ClusterMon resource.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-11-02 Thread Ken Gaillot
On 11/01/2015 03:43 AM, Karthikeyan Ramasamy wrote:
> Thanks, Ken.
> 
> I understand about stonith.  We are introducing pacemaker for an existing 
> product not for a new product.  Currently, client-side is responsible for 
> load-balancing.  
> 
> High-availability for our product is the next step.  Now, we are introducing 
> it to manage the services and a single point of control for managing the 
> services.  Once the customers get used to this, we will introduce 
> high-availability.
> 
> About the logs, can you please let me know the symptoms that I need to look 
> for?

I'd look for anything "unusual", but that's hard to describe and nearly
impossible if you're not familiar with what's "usual". I'd look for
something repeating over and over in a short time (1 or 2 seconds).

Can you give a general idea of the cluster environment? How many
resources, what cluster options are set, whether configuration changes
are being made frequently, whether failures are common, whether the
network is reliable with low latency, etc.

You might try attaching to one of the busy processes with strace and see
if it's stuck in some sort of loop.

> Thanks,
> Karthik.
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 31 அக்டோபர் 2015 03:33
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker process 10-15% CPU
> 
> On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
>> Hello,
>>   We are using Pacemaker to manage the services that run on a node, as part 
>> of a service management framework, and manage the nodes running the services 
>> as a cluster.  One service will be running as 1+1 and other services with be 
>> N+1.
>>
>>   During our testing, we see that the pacemaker processes are taking about 
>> 10-15% of the CPU.  We would like to know if this is normal and could the 
>> CPU utilization be minimised.
> 
> It's definitely not normal to stay that high for very long. If you can attach 
> your configuration and a sample of your logs, we can look for anything that 
> stands out.
> 
>> Sample Output of most used CPU process in a Active Manager is
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
>> /usr/libexec/pacemaker/cib
>> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
>> /usr/libexec/pacemaker/stonithd
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
>> /usr/libexec/pacemaker/cib
>> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
>> /usr/libexec/pacemaker/stonithd
>>
>>
>> We also observed that the processes are not distributed equally to all the 
>> available cores and saw that Redhat acknowledging that rhel doesn't 
>> distribute to the available cores efficiently.  We are trying to use 
>> IRQbalance to spread the processes to the available cores equally.
> 
> Pacemaker is single-threaded, so each process runs on only one core.
> It's up to the OS to distribute them, and any modern Linux (including
> RHEL) will do a good job of that.
> 
> IRQBalance is useful for balancing IRQ requests across cores, but it doesn't 
> do anything about processes (and doesn't need to).
> 
>> Please let us know if there is any way we could minimise the CPU 
>> utilisation.  We dont require stonith feature, but there is no way stop that 
>> daemon from running to our knowledge.  If that is also possible, please let 
>> us know.
>>
>> Thanks,
>> Karthik.
> 
> The logs will help figure out what's going wrong.
> 
> A lot of people would disagree that you don't require stonith :) Stonith is 
> necessary to recover from many possible failure scenarios, and without it, 
> you may wind up with data corruption or other problems.
> 
> Setting stonith-enabled=false will keep pacemaker from using stonith, but 
> stonithd will still run. It shouldn't take up significant resources.
> The load you're seeing is an indication of a problem somewhere.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-10-30 Thread Ken Gaillot
On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
> Hello,
>   We are using Pacemaker to manage the services that run on a node, as part 
> of a service management framework, and manage the nodes running the services 
> as a cluster.  One service will be running as 1+1 and other services with be 
> N+1.
> 
>   During our testing, we see that the pacemaker processes are taking about 
> 10-15% of the CPU.  We would like to know if this is normal and could the CPU 
> utilization be minimised.

It's definitely not normal to stay that high for very long. If you can
attach your configuration and a sample of your logs, we can look for
anything that stands out.

> Sample Output of most used CPU process in a Active Manager is
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
> /usr/libexec/pacemaker/cib
> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
> /usr/libexec/pacemaker/stonithd
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
> /usr/libexec/pacemaker/cib
> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
> /usr/libexec/pacemaker/stonithd
> 
> 
> We also observed that the processes are not distributed equally to all the 
> available cores and saw that Redhat acknowledging that rhel doesn't 
> distribute to the available cores efficiently.  We are trying to use 
> IRQbalance to spread the processes to the available cores equally.

Pacemaker is single-threaded, so each process runs on only one core.
It's up to the OS to distribute them, and any modern Linux (including
RHEL) will do a good job of that.

IRQBalance is useful for balancing IRQ requests across cores, but it
doesn't do anything about processes (and doesn't need to).

> Please let us know if there is any way we could minimise the CPU utilisation. 
>  We dont require stonith feature, but there is no way stop that daemon from 
> running to our knowledge.  If that is also possible, please let us know.
> 
> Thanks,
> Karthik.

The logs will help figure out what's going wrong.

A lot of people would disagree that you don't require stonith :) Stonith
is necessary to recover from many possible failure scenarios, and
without it, you may wind up with data corruption or other problems.

Setting stonith-enabled=false will keep pacemaker from using stonith,
but stonithd will still run. It shouldn't take up significant resources.
The load you're seeing is an indication of a problem somewhere.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crm_mon memory leak

2015-10-30 Thread Ken Gaillot
On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
> Dear Pacemaker support,
> We are using pacemaker1.1.10-14 to implement a service management framework, 
> with high availability on the road-map.  This pacemaker version was available 
> through redhat for our environments
> 
>   We are running into an issue where pacemaker causes a node to crash.  The 
> last feature we integrated was SNMP notification.  While listing out the 
> processes we found that crm_mon processes occupying 58GB of available 64GB, 
> when the node crashed.  When we removed that feature, pacemaker was stable 
> again.
> 
> Section 7.1 of the pacemaker document details that SNMP notification agent 
> triggers a crm_mon process at regular intervals.  On checking clusterlabs for 
> list of known issues, we found this crm_mon memory leak issue.  Although not 
> related, we think that there is some problem with the crm_mon process.
> 
> http://clusterlabs.org/pipermail/users/2015-August/001084.html
> 
> Can you please let us know if there are issues with SNMP notification in 
> Pacemaker or if there is anything that we could be wrong.  Also, any 
> workarounds for this issue if available, would be very helpful for us.  
> Please help.
> 
> Thanks,
> Karthik.

Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
an external script that generates the SNMP trap?

If you're using the built-in capability, that has to be explicitly
enabled when Pacemaker is compiled. Many distributions (including RHEL)
do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
have it enabled, otherwise not.

If you're using an external script to generate the SNMP trap, please
post it (with any sensitive info taken out of course).

The ClusterMon resource will generate a crm_mon at regular intervals,
but it should exit quickly. It sounds like it's not exiting at all,
which is why you see this problem.

If you have a RHEL subscription, you can open a support ticket with Red
Hat. Note that stonith must be enabled before Red Hat (and many other
vendors) will support a cluster. Also, you should be able to "yum
update" to a much newer version of Pacemaker to get bugfixes, if you're
using RHEL 6 or 7.

FYI, the latest upstream Pacemaker has a new feature that will be in
1.1.14, allowing it to call an external notification script without
needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources not starting some times after node reboot

2015-10-30 Thread Ken Gaillot
On 10/29/2015 12:42 PM, Pritam Kharat wrote:
> Hi All,
> 
> I have single node with 5 resources running on it. When I rebooted node
> sometimes I saw resources in stopped state though node comes online.
> 
> When looked in to the logs, one difference found in success and failure
> case is, when
> *Election Trigger (I_DC_TIMEOUT) just popped (2ms)  *occurred LRM did
> not start the resources instead jumped to monitor action and then onwards
> it did not start the resources at all.
> 
> But in success case this Election timeout did not come and first action
> taken by LRM was start the resource and then start monitoring it making all
> the resources started properly.
> 
> I have attached both the success and failure logs. Could some one please
> explain the reason for this issue  and how to solve this ?
> 
> 
> My CRM configuration is -
> 
> root@sc-node-2:~# crm configure show
> node $id="2" sc-node-2
> primitive oc-fw-agent upstart:oc-fw-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-lb-agent upstart:oc-lb-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-service-manager upstart:oc-service-manager \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-vpn-agent upstart:oc-vpn-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive sc_vip ocf:heartbeat:IPaddr2 \
> params ip="200.10.10.188" cidr_netmask="24" nic="eth1" \
> op monitor interval="15s"
> group sc-resources sc_vip oc-service-manager oc-fw-agent oc-lb-agent
> oc-vpn-agent
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false" \
> cluster-recheck-interval="3min" \
> default-action-timeout="180s"

The attached logs don't go far enough to be sure what happened; all they
show at that point is that in both cases, the cluster correctly probed
all the resources to be sure they weren't already running.

The behavior shouldn't be different depending on the election trigger,
but it's hard to say for sure from this info.

With a single-node cluster, you should also set no-quorum-policy=ignore.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] large cluster - failure recovery

2015-11-04 Thread Ken Gaillot
On 11/04/2015 12:55 PM, Digimer wrote:
> On 04/11/15 01:50 PM, Radoslaw Garbacz wrote:
>> Hi,
>>
>> I have a cluster of 32 nodes, and after some tuning was able to have it
>> started and running,
> 
> This is not supported by RH for a reasons; it's hard to get the timing
> right. SUSE supports up to 32 nodes, but they must be doing some serious
> magic behind the scenes.
> 
> I would *strongly* recommend dividing this up into a few smaller
> clusters... 8 nodes per cluster would be max I'd feel comfortable with.
> You need your cluster to solve more problems than it causes...

Hi Radoslaw,

RH supports up to 16. 32 should be possible with recent
pacemaker+corosync versions and careful tuning, but it's definitely
leading-edge.

An alternative with pacemaker 1.1.10+ (1.1.12+ recommended) is Pacemaker
Remote, which easily scales to dozens of nodes:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html

Pacemaker Remote is a really good approach once you start pushing the
limits of cluster nodes. Probably better than trying to get corosync to
handle more nodes. (There are long-term plans for improving corosync's
scalability, but that doesn't help you now.)

>> but it does not recover from a node disconnect-connect failure.
>> It regains quorum, but CIB does not recover to a synchronized state and
>> "cibadmin -Q" times out.
>>
>> Is there anything with corosync or pacemaker parameters I can do to make
>> it recover from such a situation
>> (everything works for smaller clusters).
>>
>> In my case it is OK for a node to disconnect (all the major resources
>> are shutdown)
>> and later reconnect the cluster (the running monitoring agent will
>> cleanup and restart major resources if needed),
>> so I do not have STONITH configured.
>>
>> Details:
>> OS: CentOS 6
>> Pacemaker: Pacemaker 1.1.9-1512.el6
> 
> Upgrade.

If you can upgrade to the latest CentOS 6.7, you can get a much newer
Pacemaker. But Pacemaker is probably not limiting your cluster nodes;
the newer version's main benefit would be Pacemaker Remote support. (Of
course there are plenty of bug fixes and new features as well.)

>> Corosync: Corosync Cluster Engine, version '2.3.2'
> 
> This is not supported on EL6 at all. Please stick with corosync 1.4 and
> use the cman pluging as the quorum provider.

CentOS is self-supported anyway, so if you're willing to handle your own
upgrades and such, nothing wrong with compiling. But corosync is up to
2.3.5 so you're already behind. :) I'd recommend compiling libqb 0.17.2
if you're compiling recent corosync and/or pacemaker.

Alternatively, CentOS 7 will have recent versions of everything.

>> Corosync configuration:
>> token: 1
>> #token_retransmits_before_loss_const: 10
>> consensus: 15000
>> join: 1000
>> send_join: 80
>> merge: 1000
>> downcheck: 2000
>> #rrp_problem_count_timeout: 5000
>> max_network_delay: 150 # for azure
>>
>>
>> Some logs:
>>
>> [...]
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> cib_process_diff: Diff 1.9254.1 -> 1.9255.1 from local not
>> applied to 1.9275.1: current "epoch" is greater than required
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> update_cib_cache_cb:  [cib_diff_notify] Patch aborted: Application
>> of an update diff failed (-1006)
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> cib_process_diff: Diff 1.9255.1 -> 1.9256.1 from local not
>> applied to 1.9275.1: current "epoch" is greater than required
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> update_cib_cache_cb:  [cib_diff_notify] Patch aborted: Application
>> of an update diff failed (-1006)
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> cib_process_diff: Diff 1.9256.1 -> 1.9257.1 from local not
>> applied to 1.9275.1: current "epoch" is greater than required
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> update_cib_cache_cb:  [cib_diff_notify] Patch aborted: Application
>> of an update diff failed (-1006)
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> cib_process_diff: Diff 1.9257.1 -> 1.9258.1 from local not
>> applied to 1.9275.1: current "epoch" is greater than required
>> Nov 04 17:50:18 [7985] ip-10-142-181-98 stonith-ng:   notice:
>> update_cib_cache_cb:  [cib_diff_notify] Patch aborted: Application
>> of an update diff failed (-1006)
>> [...]
>>
>> [...]
>> Nov 04 17:43:24 [12176] ip-10-109-145-175crm_mon:error:
>> cib_native_perform_op_delegate: Couldn't perform cib_query
>> operation (timeout=120s): Operation already in progress (-114)
>> Nov 04 17:43:24 [12176] ip-10-109-145-175crm_mon:error:
>> get_cib_copy:   Couldnt retrieve the CIB
>> Nov 04 17:43:24 [12176] ip-10-109-145-175crm_mon:error:
>> cib_native_perform_op_delegate: Couldn't perform cib_query
>> 

Re: [ClusterLabs] Multiple OpenSIPS services on one cluster

2015-11-03 Thread Ken Gaillot
On 11/03/2015 05:38 AM, Nuno Pereira wrote:
>> -Mensagem original-
>> De: Ken Gaillot [mailto:kgail...@redhat.com]
>> Enviada: segunda-feira, 2 de Novembro de 2015 19:53
>> Para: users@clusterlabs.org
>> Assunto: Re: [ClusterLabs] Multiple OpenSIPS services on one cluster
>>
>> On 11/02/2015 01:24 PM, Nuno Pereira wrote:
>>> Hi all.
>>>
>>>
>>>
>>> We have one cluster that has 9 nodes and 20 resources.
>>>
>>>
>>>
>>> Four of those hosts are PSIP-SRV01-active, PSIP-SRV01-passive,
>>> PSIP-SRV02-active and PSIP-SRV02-active.
>>>
>>> They should provide an lsb:opensips service, 2 by 2:
>>>
>>> . The SRV01-opensips and SRV01-IP resources should be active on
> one of
>>> PSIP-SRV01-active or PSIP-SRV01-passive;
>>>
>>> . The SRV02-opensips and SRV02-IP resources should be active on
> one of
>>> PSIP-SRV02-active or PSIP-SRV02-passive.
>>>
>>>
>>>
>>>
>>> Everything works fine, until the moment that one of those nodes is
>> rebooted.
>>> In the last case the problem occurred with a reboot of PSIP-SRV01-passive,
>>> that wasn't providing the service at that moment.
>>>
>>>
>>>
>>> To be noted that all opensips nodes had the opensips service to be started
> on
>>> boot by initd, which was removed in the meanwhile.
>>>
>>> The problem is that the service SRV01-opensips is detected to be started
> on
>>> both PSIP-SRV01-active and PSIP-SRV01-passive, and the SRV02-opensips is
>>> detected to be started on both PSIP-SRV01-active and PSIP-SRV02-active.
>>>
>>> After that and several operations done by the cluster, which include
> actions
>>> to stop both SRV01-opensips on both PSIP-SRV01-active and PSIP-SRV01-
>> passive,
>>> and to stop SRV02-opensips on PSIP-SRV01-active and PSIP-SRV02-active,
>> which
>>> fail on PSIP-SRV01-passive, the resource SRV01-opensips becomes
>> unmanaged.
>>>
>>>
>>>
>>> Any ideas on how to fix this?
>>>
>>> Nuno Pereira
>>>
>>> G9Telecom
>>
>> Your configuration looks appropriate, so it sounds like something is
>> still starting the opensips services outside cluster control. Pacemaker
>> recovers from multiple running instances by stopping them all, then
>> starting on the expected node.
> Yesterday I removed the pacemaker from starting on boot, and
> tested it: the problem persists.
> Also, I checked the logs and the opensips wasn't started on the
> PSIP-SRV01-passive machine, the one that was rebooted.
> Is it possible to change that behaviour, as it is undesirable for our
> environment?
> For example, only to stop it on one of the hosts.
> 
>> You can verify that Pacemaker did not start the extra instances by
>> looking for start messages in the logs (they will look like "Operation
>> SRV01-opensips_start_0" etc.).
> On the rebooted node I don't see 2 starts, but only 2 failed stops, the first
> failed for the service that wasn't supposed to run there, and a normal one for
> the service that was supposed to run there:
> 
> Nov 02 23:01:24 [1692] PSIP-SRV01-passive   crmd:error:
> process_lrm_event:  Operation SRV02-opensips_stop_0 (node=PSIP-
> SRV01-passive, call=52, status=4, cib-update=23, confirmed=true) Error
> Nov 02 23:01:24 [1692] PSIP-SRV01-passive   crmd:   notice:
> process_lrm_event:  Operation SRV01-opensips_stop_0: ok (node=PSIP-
> SRV01-passive, call=51, rc=0, cib-update=24, confirmed=true)
> 
> 
>> The other question is why did the stop command fail. The logs should
>> shed some light on that too; look for the equivalent "_stop_0" operation
>> and the messages around it. The resource agent might have reported an
>> error, or it might have timed out.
> I see this:
> 
> Nov 02 23:01:24 [1689] PSIP-SRV01-passive   lrmd:  warning:
> operation_finished: SRV02-opensips_stop_0:1983 - terminated with signal 15
> Nov 02 23:01:24 [1689] PSIP-BBT01-passive   lrmd: info: log_finished:
> finished - rsc: SRV02-opensips action:stop call_id:52 pid:1983 exit-code:1
> exec-time:79ms queue-time:0ms
> 
> As it can be seen above, the call_id for the failed stop is greater that the
> one with success, but ends before.
> Also, as both operations are stopping the exact same service, the last one
> fails. And on the case of the one that fails, it wasn't supposed to be stopped
> or started in that host, as was configured.

I think I see what's hap

Re: [ClusterLabs] Multiple OpenSIPS services on one cluster

2015-11-03 Thread Ken Gaillot
On 11/03/2015 01:40 PM, Nuno Pereira wrote:
>> -Mensagem original-
>> De: Ken Gaillot [mailto:kgail...@redhat.com]
>> Enviada: terça-feira, 3 de Novembro de 2015 18:02
>> Para: Nuno Pereira; 'Cluster Labs - All topics related to open-source
> clustering
>> welcomed'
>> Assunto: Re: [ClusterLabs] Multiple OpenSIPS services on one cluster
>>
>> On 11/03/2015 05:38 AM, Nuno Pereira wrote:
>>>> -Mensagem original-
>>>> De: Ken Gaillot [mailto:kgail...@redhat.com]
>>>> Enviada: segunda-feira, 2 de Novembro de 2015 19:53
>>>> Para: users@clusterlabs.org
>>>> Assunto: Re: [ClusterLabs] Multiple OpenSIPS services on one cluster
>>>>
>>>> On 11/02/2015 01:24 PM, Nuno Pereira wrote:
>>>>> Hi all.
>>>>>
>>>>>
>>>>>
>>>>> We have one cluster that has 9 nodes and 20 resources.
>>>>>
>>>>>
>>>>>
>>>>> Four of those hosts are PSIP-SRV01-active, PSIP-SRV01-passive,
>>>>> PSIP-SRV02-active and PSIP-SRV02-active.
>>>>>
>>>>> They should provide an lsb:opensips service, 2 by 2:
>>>>>
>>>>> . The SRV01-opensips and SRV01-IP resources should be active on
>>> one of
>>>>> PSIP-SRV01-active or PSIP-SRV01-passive;
>>>>>
>>>>> . The SRV02-opensips and SRV02-IP resources should be active on
>>> one of
>>>>> PSIP-SRV02-active or PSIP-SRV02-passive.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Everything works fine, until the moment that one of those nodes is
>>>> rebooted.
>>>>> In the last case the problem occurred with a reboot of
> PSIP-SRV01-passive,
>>>>> that wasn't providing the service at that moment.
>>>>>
>>>>>
>>>>>
>>>>> To be noted that all opensips nodes had the opensips service to be
> started
>>> on
>>>>> boot by initd, which was removed in the meanwhile.
>>>>>
>>>>> The problem is that the service SRV01-opensips is detected to be started
>>> on
>>>>> both PSIP-SRV01-active and PSIP-SRV01-passive, and the SRV02-opensips
>> is
>>>>> detected to be started on both PSIP-SRV01-active and PSIP-SRV02-active.
>>>>>
>>>>> After that and several operations done by the cluster, which include
>>> actions
>>>>> to stop both SRV01-opensips on both PSIP-SRV01-active and PSIP-SRV01-
>>>> passive,
>>>>> and to stop SRV02-opensips on PSIP-SRV01-active and PSIP-SRV02-active,
>>>> which
>>>>> fail on PSIP-SRV01-passive, the resource SRV01-opensips becomes
>>>> unmanaged.
>>>>>
>>>>>
>>>>>
>>>>> Any ideas on how to fix this?
>>>>>
>>>>> Nuno Pereira
>>>>>
>>>>> G9Telecom
>>>>
>>>> Your configuration looks appropriate, so it sounds like something is
>>>> still starting the opensips services outside cluster control. Pacemaker
>>>> recovers from multiple running instances by stopping them all, then
>>>> starting on the expected node.
>>> Yesterday I removed the pacemaker from starting on boot, and
>>> tested it: the problem persists.
>>> Also, I checked the logs and the opensips wasn't started on the
>>> PSIP-SRV01-passive machine, the one that was rebooted.
>>> Is it possible to change that behaviour, as it is undesirable for our
>>> environment?
>>> For example, only to stop it on one of the hosts.
>>>
>>>> You can verify that Pacemaker did not start the extra instances by
>>>> looking for start messages in the logs (they will look like "Operation
>>>> SRV01-opensips_start_0" etc.).
>>> On the rebooted node I don't see 2 starts, but only 2 failed stops, the
> first
>>> failed for the service that wasn't supposed to run there, and a normal one
> for
>>> the service that was supposed to run there:
>>>
>>> Nov 02 23:01:24 [1692] PSIP-SRV01-passive   crmd:error:
>>> process_lrm_event:  Operation SRV02-opensips_stop_0 (node=PSIP-
>>> SRV01-passive, call=52, status=4, cib-update=23, confirmed=true) Error
>>> Nov 02 23:01:24 [1692] PSIP-SRV01-passive   crmd:   notice:
>>> process_lrm_event:  Operation SRV01-opensips_sto

Re: [ClusterLabs] Pacemaker build error

2015-11-04 Thread Ken Gaillot
On 11/03/2015 11:10 PM, Jim Van Oosten wrote:
> 
> 
> I am getting a compile error when building Pacemaker on Linux version
> 2.6.32-431.el6.x86_64.
> 
> The build commands:
> 
> git clone git://github.com/ClusterLabs/pacemaker.git
> cd pacemaker
> ./autogen.sh && ./configure --prefix=/usr --sysconfdir=/etc
> make
> make install
> 
> The compile error:
> 
> Making install in services
> gmake[2]: Entering directory
> `/tmp/software/HA_linux/pacemaker/lib/services'
>   CC   libcrmservice_la-services.lo
> services.c: In function 'resources_action_create':
> services.c:153: error: 'svc_action_private_t' has no member named 'pending'
> services.c: In function 'services_action_create_generic':
> services.c:340: error: 'svc_action_private_t' has no member named 'pending'
> gmake[2]: *** [libcrmservice_la-services.lo] Error 1
> gmake[2]: Leaving directory `/tmp/software/HA_linux/pacemaker/lib/services'
> gmake[1]: *** [install-recursive] Error 1
> gmake[1]: Leaving directory `/tmp/software/HA_linux/pacemaker/lib'
> make: *** [install-recursive] Error 1
> 
> 
> The pending field that services.c is attenpting to set is conditioned on
> the SUPPORT_DBUS flag in services_private.h.
> 
> pacemaker/lib/services/services_private.h
> 
>
>  #if SUPPORT_DBUS  
>
>
>
> 
> 
> 
> 
> 
>  DBusPendingCall*   
>  pending;   
> 
> 
> 
> 
> 
>  unsigned timerid;  
> 
> 
> 
> 
> 
> 
>  #endif 
> 
> 
> 
> 
> 
> Am I building Pacemaker incorrectly or should I open an defect for this
> problem?
> 
> Jim VanOosten
> jimvo at  us.ibm.com

This report is enough; I'll do up a quick fix today. Thanks for catching it.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker build error

2015-11-04 Thread Ken Gaillot
On 11/04/2015 09:31 AM, Ken Gaillot wrote:
> On 11/03/2015 11:10 PM, Jim Van Oosten wrote:
>>
>>
>> I am getting a compile error when building Pacemaker on Linux version
>> 2.6.32-431.el6.x86_64.
>>
>> The build commands:
>>
>> git clone git://github.com/ClusterLabs/pacemaker.git
>> cd pacemaker
>> ./autogen.sh && ./configure --prefix=/usr --sysconfdir=/etc
>> make
>> make install
>>
>> The compile error:
>>
>> Making install in services
>> gmake[2]: Entering directory
>> `/tmp/software/HA_linux/pacemaker/lib/services'
>>   CC   libcrmservice_la-services.lo
>> services.c: In function 'resources_action_create':
>> services.c:153: error: 'svc_action_private_t' has no member named 'pending'
>> services.c: In function 'services_action_create_generic':
>> services.c:340: error: 'svc_action_private_t' has no member named 'pending'
>> gmake[2]: *** [libcrmservice_la-services.lo] Error 1
>> gmake[2]: Leaving directory `/tmp/software/HA_linux/pacemaker/lib/services'
>> gmake[1]: *** [install-recursive] Error 1
>> gmake[1]: Leaving directory `/tmp/software/HA_linux/pacemaker/lib'
>> make: *** [install-recursive] Error 1
>>
>>
>> The pending field that services.c is attenpting to set is conditioned on
>> the SUPPORT_DBUS flag in services_private.h.
>>
>> pacemaker/lib/services/services_private.h
>>
>>
>>  #if SUPPORT_DBUS  
>>
>>
>>
>>
>>
>> 
>> 
>> 
>>  DBusPendingCall*   
>>  pending;   
>> 
>> 
>> 
>> 
>> 
>>  unsigned timerid;  
>> 
>> 
>> 
>>
>>
>> 
>>  #endif 
>> 
>>
>>
>>
>>
>> Am I building Pacemaker incorrectly or should I open an defect for this
>> problem?
>>
>> Jim VanOosten
>> jimvo at  us.ibm.com
> 
> This report is enough; I'll do up a quick fix today. Thanks for catching it.

The fix is upstream. Pull the latest commit and you should be good to go.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] multiple action= lines sent to STDIN of fencing agents - why?

2015-10-15 Thread Ken Gaillot
On 10/15/2015 06:25 AM, Adam Spiers wrote:
> I inserted some debugging into fencing.py and found that stonithd
> sends stuff like this to STDIN of the fencing agents it forks:
> 
> action=list
> param1=value1
> param2=value2
> param3=value3
> action=list
> 
> where paramX and valueX come from the configuration of the primitive
> for the fencing agent.
> 
> As a corollary, if the primitive for the fencing agent has 'action'
> defined as one of its parameters, this means that there will be three
> 'action=' lines, and the middle one could have a different value to
> the two sandwiching it.
> 
> When I first saw this, I had an extended #wtf moment and thought it
> was a bug.  But on closer inspection, it seems very deliberate, e.g.
> 
>   
> https://github.com/ClusterLabs/pacemaker/commit/bfd620645f151b71fafafa279969e9d8bd0fd74f
> 
> The "regardless of what the admin configured" comment suggests to me
> that there is an underlying assumption that any fencing agent will
> ensure that if the same parameter is duplicated on STDIN, the final
> value will override any previous ones.  And indeed fencing.py ensures
> this, but presumably it is possible to write agents which don't use
> fencing.py.
> 
> Is my understanding correct?  If so:

Yes, good sleuthing.

> 1) Is the first 'action=' line intended in order to set some kind of
>default action, in the case that the admin didn't configure the
>primitive with an 'action=' parameter *and* _action wasn't one of
>list/status/monitor/metadata?  In what circumstances would this
>happen?

The first action line is usually the only one.

Ideally, admins don't configure "action" as a parameter of a fence
device. They either specify nothing (in which case the cluster does what
it thinks should be done -- reboot, off, etc.), or they specify
pcmk_*_action to override the cluster's choice. For example,
pcmk_reboot_action=off tells the cluster to actually send the fence
agent "action=off" when a reboot is desired. (Perhaps the admin prefers
that flaky nodes stay down until investigated, or the fence device
doesn't handle reboots well.)

So the first action line is the result of that. If the admin configured
a pcmk_*_action for the requested action, the agent will get that,
otherwise it gets the requested action.

Second, any parameters in the device configuration are copied to the
agent. So if the admin did specify "action" there, it will get copied
(as a second instance, possibly different from the first).

But that would override *all* requested actions, which is a bad idea. No
one wants a recurring monitor action to shoot a node! :) So that last
step is a failsafe, if the admin did supply an "action", re-send the
original line if the requested action was informational
(list/status/monitor/metadata) and not a "real" fencing action (off/reboot).

> 2) Is this assumption of the agents always being order-sensitive
>(i.e. last value always wins) documented anywhere?  The best
>documentation on the API I could find was here:
> 
>   https://fedorahosted.org/cluster/wiki/FenceAgentAPI
> 
>but it doesn't mention this.

Good point. It would be a good idea to add that to the API since it's
established practice, but it probably would also be a good idea for
pacemaker to send only the final value of any parameter.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync+Pacemaker error during failover

2015-10-08 Thread Ken Gaillot
On 10/08/2015 10:16 AM, priyanka wrote:
> Hi,
> 
> We are trying to build a HA setup for our servers using DRBD + Corosync
> + pacemaker stack.
> 
> Attached is the configuration file for corosync/pacemaker and drbd.

A few things I noticed:

* Don't set become-primary-on in the DRBD configuration in a Pacemaker
cluster; Pacemaker should handle all promotions to primary.

* I'm no NFS expert, but why is res_exportfs_root cloned? Can both
servers export it at the same time? I would expect it to be in the group
before res_exportfs_export1.

* Your constraints need some adjustment. Partly it depends on the answer
to the previous question, but currently res_fs (via the group) is
ordered after res_exportfs_root, and I don't see how that could work.

> We are getting errors while testing this setup.
> 1. When we stop corosync on Master machine say server1(lock), it is
> Stonith'ed. In this case slave-server2(sher) is promoted to master.
>But when server1(lock) reboots res_exportfs_export1 is started on
> both the servers and that resource goes into failed state followed by
> servers going into unclean state.
>Then server1(lock) reboots and server2(sher) is master but in unclean
> state. After server1(lock) comes up, server2(sher) is stonith'ed and
> server1(lock) is slave(the only online node).
>When server2(sher) comes up, both the servers are slaves and resource
> group(rg_export) is stopped. Then server2(sher) becomes Master and
> server1(lock) is slave and resource group is started.
>At this point configuration becomes stable.
> 
> 
> PFA logs(syslog) of server2(sher) after it is promoted to master till it
> is first rebooted when resource exportfs goes into failed state.
> 
> Please let us know if the configuration is appropriate. From the logs we
> could not figure out exact reason of resource failure.
> Your comment on this scenario will be very helpful.
> 
> Thanks,
> Priyanka
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stopped node detection.

2015-10-16 Thread Ken Gaillot
On 10/15/2015 03:55 PM, Vallevand, Mark K wrote:
> Ubuntu 12.04 LTS
> pacemaker 1.1.10
> cman 3.1.7
> corosync 1.4.6
> 
> If my cluster has no resources, it seems like it takes 20s for a stopped node 
> to be detected.  Is the value really 20s and is it a parameter that can be 
> adjusted?

The corosync token timeout is the main factor, so check your corosync.conf.

Pacemaker will then try to fence the node (if it was stopped uncleanly),
so that will take some time depending on what fencing you're using.

Generally this takes much less than 20s, but maybe you have a longer
timeout configured, or fencing is not working, or something like that.
The logs should have some clues, post them if you can't find it.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.14: remapping sequential reboots to all-off-then-all-on

2015-10-19 Thread Ken Gaillot
On 10/19/2015 11:42 AM, Digimer wrote:
> On 19/10/15 12:34 PM, Ken Gaillot wrote:
>> Pacemaker supports fencing "topologies", allowing multiple fencing
>> devices to be used (in conjunction or as fallbacks) when a node needs to
>> be fenced.
>>
>> However, there is a catch when using something like redundant power
>> supplies. If you put two power switches in the same topology level, and
>> Pacemaker needs to reboot the node, it will reboot the first power
>> switch and then the second -- which has no effect since the supplies are
>> redundant.
>>
>> Pacemaker's upstream master branch has new handling that will be part of
>> the eventual 1.1.14 release. In such a case, it will turn all the
>> devices off, then turn them all back on again.
> 
> How long will it leave stay in the 'off' state? Is it configurable? I
> ask because if it's too short, some PSUs may not actually lose power.
> One or two seconds should be way more than enough though.

It simply waits for the fence agent to return success from the "off"
command before proceeding. I wouldn't assume any particular time between
that and initiating "on", and there's no way to set a delay there --
it's up to the agent to not return success until the action is actually
complete.

The standard says that agents should actually confirm that the device is
in the desired state after sending a command, so hopefully this is
already baked in.

>> With previous versions, there was a complicated configuration workaround
>> involving creating separate devices for the off and on actions. With the
>> new version, it happens automatically, and no special configuration is
>> needed.
>>
>> Here's an example where node1 is the affected node, and apc1 and apc2
>> are the fence devices:
>>
>>pcs stonith level add 1 node1 apc1,apc2
> 
> Where would the outlet definition go? 'apc1:4,apc2:4'?

"apc1" here is name of a Pacemaker fence resource. Hostname, port, etc.
would be configured in the definition of the "apc1" resource (which I
omitted above to focus on the topology config).

>> Of course you can configure it using crm or XML as well.
>>
>> The fencing operation will be treated as successful as long as the "off"
>> commands succeed, because then it is safe for the cluster to recover any
>> resources that were on the node. Timeouts and errors in the "on" phase
>> will be logged but ignored.
>>
>> Any action-specific timeout for the remapped action will be used (for
>> example, pcmk_off_timeout will be used when executing the "off" command,
>> not pcmk_reboot_timeout).
> 
> I think this answers my question about how long it stays off for. What
> would be an example config to control the off time then?

This isn't a delay, but a timeout before declaring the action failed. If
an "off" command does not return in this amount of time, the command
(and the entire topology level) will be considered failed, and the next
level will be tried.

The timeouts are configured in the fence resource definition. So
combining the above questions, apc1 might be defined like this:

   pcs stonith create apc1 fence_apc_snmp \
  ipaddr=apc1.example.com \
  login=user passwd='supersecret' \
  pcmk_off_timeout=30s \
  pcmk_host_map="node1.example.com:1,node2.example.com:2"

>> The new code knows to skip the "on" step if the fence agent has
>> automatic unfencing (because it will happen when the node rejoins the
>> cluster). This allows fence_scsi to work with this feature.
> 
> http://i.imgur.com/i7BzivK.png

:-D


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in 1.1.14: remapping sequential reboots to all-off-then-all-on

2015-10-19 Thread Ken Gaillot
Pacemaker supports fencing "topologies", allowing multiple fencing
devices to be used (in conjunction or as fallbacks) when a node needs to
be fenced.

However, there is a catch when using something like redundant power
supplies. If you put two power switches in the same topology level, and
Pacemaker needs to reboot the node, it will reboot the first power
switch and then the second -- which has no effect since the supplies are
redundant.

Pacemaker's upstream master branch has new handling that will be part of
the eventual 1.1.14 release. In such a case, it will turn all the
devices off, then turn them all back on again.

With previous versions, there was a complicated configuration workaround
involving creating separate devices for the off and on actions. With the
new version, it happens automatically, and no special configuration is
needed.

Here's an example where node1 is the affected node, and apc1 and apc2
are the fence devices:

   pcs stonith level add 1 node1 apc1,apc2

Of course you can configure it using crm or XML as well.

The fencing operation will be treated as successful as long as the "off"
commands succeed, because then it is safe for the cluster to recover any
resources that were on the node. Timeouts and errors in the "on" phase
will be logged but ignored.

Any action-specific timeout for the remapped action will be used (for
example, pcmk_off_timeout will be used when executing the "off" command,
not pcmk_reboot_timeout).

The new code knows to skip the "on" step if the fence agent has
automatic unfencing (because it will happen when the node rejoins the
cluster). This allows fence_scsi to work with this feature.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] group resources not grouped ?!?

2015-10-07 Thread Ken Gaillot
On 10/07/2015 09:12 AM, zulucloud wrote:
> Hi,
> i got a problem i don't understand, maybe someone can give me a hint.
> 
> My 2-node cluster (named ali and baba) is configured to run mysql, an IP
> for mysql and the filesystem resource (on drbd master) together as a
> GROUP. After doing some crash-tests i ended up having filesystem and
> mysql running happily on one host (ali), and the related IP on the other
> (baba)  although, the IP's not really up and running, crm_mon just
> SHOWS it as started there. In fact it's nowhere up, neither on ali nor
> on baba.
> 
> crm_mon shows that pacemaker tried to start it on baba, but gave up
> after fail-count=100.
> 
> Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's
> group lives?
> Q2: why doesn't pacemaker try to start the IP on ali, after max
> failcount had been reached on baba?
> Q3: why is crm_mon showing the IP as "started", when it's down after
> 10 tries?
> 
> Thanks :)
> 
> 
> config (some parts removed):
> ---
> node ali
> node baba
> 
> primitive res_drbd ocf:linbit:drbd \
> params drbd_resource="r0" \
> op stop interval="0" timeout="100" \
> op start interval="0" timeout="240" \
> op promote interval="0" timeout="90" \
> op demote interval="0" timeout="90" \
> op notify interval="0" timeout="90" \
> op monitor interval="40" role="Slave" timeout="20" \
> op monitor interval="20" role="Master" timeout="20"
> primitive res_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \
> op monitor interval="30s"
> primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \
> params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \
> op monitor interval="10s" timeout="20s" depth="0"
> primitive res_mysql lsb:mysql \
> op start interval="0" timeout="15" \
> op stop interval="0" timeout="15" \
> op monitor start-delay="30" interval="15" time-out="15"
> 
> group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \
> meta target-role="Started"
> ms ms_drbd res_drbd \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> 
> colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master
> 
> order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start
> 
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> cluster-infrastructure="openais" \
> stonith-enabled="false" \

Not having stonith is part of the problem (see below).

Without stonith, if the two nodes go into split brain (both up but can't
communicate with each other), Pacemaker will try to promote DRBD to
master on both nodes, mount the filesystem on both nodes, and start
MySQL on both nodes.

> no-quorum-policy="ignore" \
> expected-quorum-votes="2" \
> last-lrm-refresh="1438857246"
> 
> 
> crm_mon -rnf (some parts removed):
> -
> Node ali: online
> res_fs  (ocf::heartbeat:Filesystem) Started
> res_mysql   (lsb:mysql) Started
> res_drbd:0  (ocf::linbit:drbd) Master
> Node baba: online
> res_hamysql_ip  (ocf::heartbeat:IPaddr2) Started
> res_drbd:1  (ocf::linbit:drbd) Slave
> 
> Inactive resources:
> 
> Migration summary:
> 
> * Node baba:
>res_hamysql_ip: migration-threshold=100 fail-count=100
> 
> Failed actions:
> res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1,
> status=complete): unknown error

The "_stop_" above means that a *stop* action on the IP failed.
Pacemaker tried to migrate the IP by first stopping it on baba, but it
couldn't. (Since the IP is the last member of the group, its failure
didn't prevent the other members from moving.)

Normally, when a stop fails, Pacemaker fences the node so it can safely
bring up the resource on the other node. But you disabled stonith, so it
got into this state.

So, to proceed:

1) Stonith would help :)

2) Figure out why it couldn't stop the IP. There might be a clue in the
logs on baba (though they are indeed hard to follow; search for
"res_hamysql_stop_0" around this time, and look around there). You could
also try adding and removing the IP manually, first with the usual OS
commands, and if that works, by calling the IP resource agent directly.
That often turns up the problem.

> 
> corosync.log:
> --
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0:
> unmanaged failed resources cannot prevent shutdown
> 
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0:
> unmanaged failed resources cannot prevent shutdown
> 
> Software:
> --
> corosync 1.2.1-4
> pacemaker 1.0.9.1+hg15626-1
> drbd8-utils 2:8.3.7-2.1
> (for some reason it's not possible to update at this time)

It should be possible to get 

Re: [ClusterLabs] Antw: Monitoring Op for LVM - Excessive Logging

2015-10-09 Thread Ken Gaillot
On 10/09/2015 08:06 AM, Ulrich Windl wrote:
 Jorge Fábregas  schrieb am 09.10.2015 um 14:20
> in
> Nachricht <5617b10f.1060...@gmail.com>:
>> Hi,
>>
>> Is there a way to stop the excessive logging produced by the LVM monitor
>> operation?  I got it set at the default (30 seconds) here on SLES 11
>> SP4.  However, everytime it runs the DC will write 174 lines on
>> /var/log/messages (all coming from LVM).   I'm referring to the LVM
>> primitive resource (the one that activates a VG).  I'm also using DLM/cLVM.
>>
>> I checked /etc/lvm/lvm.conf and the logging defaults are reasonable
>> (verbose value set at 0 which is the lowest).
> 
> Did you try daemon_options="-d0"? (in clvmd resource)

It's been a long while since I used clvmd, so they may have fixed this
since then, but there used to be a bug that clvmd would always start up
with debug logging, even if -d0 was set.

Luckily, dynamic disabling of debug mode worked and could be done
anytime after clvmd is started, using "clvmd -C -d0".

What I wound up doing was configuring swatch to monitor the logs, and
run that command if it saw debug messages!


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] CRM managing ADSL connection; failure not handled

2015-08-27 Thread Ken Gaillot
On 08/27/2015 03:04 AM, Tom Yates wrote:
 On Mon, 24 Aug 2015, Andrei Borzenkov wrote:
 
 24.08.2015 13:32, Tom Yates пишет:
  if i understand you aright, my problem is that the stop script didn't
  return a 0 (OK) exit status, so CRM didn't know where to go.  is the
  exit status of the stop script how CRM determines the status of the
 stop
  operation?

 correct

  does CRM also use the output of /etc/init.d/script status to
 determine
  continuing successful operation?

 It definitely does not use *output* of script - only return code. If
 the question is whether it probes resource additionally to checking
 stop exit code - I do not think so (I know it does it in some cases
 for systemd resources).
 
 i just thought i'd come back and follow-up.  in testing this morning, i
 can confirm that the pppoe-stop command returns status 1 if pppd isn't
 running.  that makes a standard init.d script, which passes on the
 return code of the stop command, unhelpful to CRM.
 
 i changed the script so that on stop, having run pppoe-stop, it checks
 for the existence of a working ppp0 interface, and returns 0 IFO there
 is none.

Nice

 If resource was previously active and stop was attempted as cleanup
 after resource failure - yes, it should attempt to start it again.
 
 that is now what happens.  it seems to try three time to bring up pppd,
 then kicks the service over to the other node.
 
 in the case of extended outages (ie, the ISP goes away for more than
 about 10 minutes), where both nodes have time to fail, we end up back in
 the bad old state (service failed on both nodes):
 
 [root@positron ~]# crm status
 [...]
 Online: [ electron positron ]
 
  Resource Group: BothIPs
  InternalIP (ocf::heartbeat:IPaddr):Started electron
  ExternalIP (lsb:hb-adsl-helper):   Stopped
 
 Failed actions:
 ExternalIP_monitor_6 (node=positron, call=15, rc=7,
 status=complete): not running
 ExternalIP_start_0 (node=positron, call=17, rc=-2, status=Timed
 Out): unknown exec error
 ExternalIP_start_0 (node=electron, call=6, rc=-2, status=Timed Out):
 unknown exec error
 
 is there any way to configure CRM to keep kicking the service between
 the two nodes forever (ie, try three times on positron, kick service
 group to electron, try three times on electron, kick back to positron,
 lather rinse repeat...)?
 
 for a service like DSL, which can go away for extended periods through
 no local fault then suddenly and with no announcement come back, this
 would be most useful behaviour.

Yes, see migration-threshold and failure-timeout.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options

 thanks to all for help with this.  thanks also to those who have
 suggested i rewrite this as an OCF agent (especially to ken gaillot who
 was kind enough to point me to documentation); i will look at that if
 time permits.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-08-27 Thread Ken Gaillot
On 08/27/2015 02:42 AM, Rakovec Jost wrote:
 Hi
 
 
 it doesn't work as I expected, I change name to:
 
 location loc-aapche-sles1 aapche role=Started 10: sles1
 
 
 but after I manual move resource via HAWK to other node it auto add this line:
 
 location cli-prefer-aapche aapche role=Started inf: sles1
 
 
 so now I have both lines:
 
 location cli-prefer-aapche aapche role=Started inf: sles1
 location loc-aapche-sles1 aapche role=Started 10: sles1

When you manually move a resource using a command-line tool, those tools
accomplish the moving by adding a constraint, like the one you see added
above.

Such tools generally provide another option to clear any constraints
they added, which you can manually run after you are satisfied with the
state of things. Until you do so, the added constraint will remain, and
will affect resource placement.

 
 and resource-stickiness doesn't work since after fence node1 the resource is 
 move back to node1 after node1 come back and this is what I don't like. I 
 know that I can remove line  that was added by cluster, but this is not the 
 proper solution. Please tell me what is wrong. Thanks.  My config: 

Resource placement depends on many factors. Scores affect the outcome;
stickiness has a score, and each constraint has a score, and the active
node with the highest score wins.

In your config, resource-stickiness has a score of 1000, but
cli-aapche-sles1 has a score of inf (infinity), so sles1 wins when it
comes back online (infinity  1000). By contrast, loc-aapche-sles1 has a
score of 10, so by itself, it would not cause the resource to move back
(10  1000).

To achieve what you want, clear the temporary constraint added by hawk,
before sles1 comes back.

 node sles1
 node sles2
 primitive filesystem Filesystem \
 params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60 \
 op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
 params ip=10.9.131.86 \
 op start interval=0 timeout=20s \
 op stop interval=0 timeout=20s \
 op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
 params pcmk_delay_max=30
 primitive web apache \
 params configfile=/etc/apache2/httpd.conf \
 op start interval=0 timeout=40s \
 op stop interval=0 timeout=60s \
 op monitor interval=10 timeout=20s
 group aapche filesystem myip web \
 meta target-role=Started is-managed=true resource-stickiness=1000
 location cli-prefer-aapche aapche role=Started inf: sles1
 location loc-aapche-sles1 aapche role=Started 10: sles1
 property cib-bootstrap-options: \
 stonith-enabled=true \
 no-quorum-policy=ignore \
 placement-strategy=balanced \
 expected-quorum-votes=2 \
 dc-version=1.1.12-f47ea56 \
 cluster-infrastructure=classic openais (with plugin) \
 last-lrm-refresh=1440502955 \
 stonith-timeout=40s
 rsc_defaults rsc-options: \
 resource-stickiness=1000 \
 migration-threshold=3
 op_defaults op-options: \
 timeout=600 \
 record-pending=true
 
 
 BR
 
 Jost
 
 
 
 
 From: Andrew Beekhof and...@beekhof.net
 Sent: Thursday, August 27, 2015 12:20 AM
 To: Cluster Labs - All topics related to open-source clustering welcomed
 Subject: Re: [ClusterLabs] resource-stickiness
 
 On 26 Aug 2015, at 10:09 pm, Rakovec Jost jost.rako...@snt.si wrote:

 Sorry  one typo: problem is the same


 location cli-prefer-aapche aapche role=Started 10: sles2
 
 Change the name of your constraint.
 The 'cli-prefer-’ prefix is reserved for “temporary” constraints created by 
 the command line tools (which therefor feel entitled to delete them as 
 necessary).
 

 to:

 location cli-prefer-aapche aapche role=Started inf: sles2


 It keep change to infinity.



 my configuration is:

 node sles1
 node sles2
 primitive filesystem Filesystem \
params fstype=ext3 directory=/srv/www/vhosts device=/dev/xvdd1 \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40
 primitive myip IPaddr2 \
params ip=x.x.x.x \
op start interval=0 timeout=20s \
op stop interval=0 timeout=20s \
op monitor interval=10s timeout=20s
 primitive stonith_sbd stonith:external/sbd \
params pcmk_delay_max=30
 primitive web apache \
params configfile=/etc/apache2/httpd.conf \
op start interval=0 timeout=40s \
op stop interval=0 timeout=60s \
op monitor interval=10 timeout=20s
 group aapche filesystem myip web \
meta target-role=Started is-managed=true resource-stickiness=1000
 location cli-prefer-aapche aapche role=Started 10: sles2
 property cib-bootstrap-options: \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced 

Re: [ClusterLabs] HA Cluster and Fencing

2015-09-03 Thread Ken Gaillot
On 09/03/2015 11:44 AM, Streeter, Michelle N wrote:
> I was trying to get a HA Cluster working but it was not failing over.   In 
> past posts, someone kept asking me to get the fencing working and make it a 
> priority.  So I finally got the fencing to work with VBox.  And the fail over 
> finally started working for my HA cluster.   When I tried to explain this to 
> my lead, he didn't believe me that the fencing was the issue with the fail 
> over.   So, would someone help me understand why this happened so I can 
> explain it to my lead.   Also, when I was trying to get Pacemaker 1.1.11 
> working, it was failing over fine without the fencing but when I added more 
> than one drive to be serviced by the cluster via NFS.   The drives were being 
> serviced by  both nodes almost as if it was load balancing.  It was suggested 
> back then to get the fencing working.   So I take it if I went back to that 
> version, this would have fixed the issue.  Would you also help me explain why 
> this is true?
> 
> Michelle Streeter
> ASC2 MCS - SDE/ACL/SDL/EDL OKC Software Engineer
> The Boeing Company

Hi Michelle,

Congratulations on getting fencing working.

There's not enough information about your configuration to answer your
questions, but fencing is more a requirement for general cluster
stability rather than a solution to the specific problems you were facing.

Regarding load-balancing, I'm not sure whether you mean that a single
resource was started on multiple nodes, or different resources were
spread out on multiple nodes.

If one resource is active on multiple nodes, that means it was defined
as a clone or master-slave resource in your configuration. Clones are
used for active-active HA. If you want active-passive, where the
resource is only active on one node, don't clone it.

If instead you mean that multiple resources were spread out among nodes,
that's Pacemaker's default behavior. If you want two resources to always
be started together on the same node, you need to define a colocation
constraint for them (as well as an ordering constraint if one has to
start before the other), or put them in a resource group.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource-stickiness

2015-09-02 Thread Ken Gaillot
On 09/02/2015 08:11 AM, Rakovec Jost wrote:
> Hi
> 
> Can I ask something else in this thred or shoud I open a new one?

Either is fine but a new one is probably more helpful to people
searching online later :)

> questions:
> 
> 1. whta is the purpos of "meta target-role=Started"  in
> 
> primitive apache apache \
> params configfile="/etc/apache2/httpd.conf" \
> op monitor timeout=20s interval=10 \
> op stop timeout=60s interval=0 \
> op start timeout=40s interval=0 \
> meta target-role=Started
> 
> I just find that if I tray to "start Parent:" it don't start any resource 
> from group. But if I remove "meta target-role=Started" then it start all 
> resources.

target-role=started is the default, so I'm not sure why you're seeing
that behavior.

It just means that the cluster should try to keep the service running.
If you set it to stopped, the cluster will try to keep it stopped. (For
master/slave resources, there's also master, for running in the master
state.)

I'm also not sure what "start Parent:" means. I haven't used crm in a
while, so maybe it's crm-specific? In general, the cluster manages
starting and stopping of services automatically, and you can use
target-role to tell it what you want it to do.

> 2. How can I just change something by CLI crm for example:
> 
> I have this in my configuration:
> 
> primitive stonith_sbd stonith:external/sbd
> 
> but I would like to add this:
> 
> crm(live)configure# stonith_sbd stonith:external/sbd \
>> params pcmk_delay_max="30"
> ERROR: configure.stonith_sbd: No such command
> 
> I know that I can delete and then add new, but I don't like this solution.
> 
> 3. Do I need to add colocation and order:
> 
> colocation apache-with-fs-ip inf: fs myip apache
> 
> and 
> 
> order apache-after-fs-ip Mandatory: fs myip apache
> 
> 
> if I'm using group like this:
> 
> group web fs myip apache \
> meta target-role=Started is-managed=true resource-stickiness=1000

You don't need them. A group is essentially a shorthand for colocation
and order constraints for all its members in the order they're listed.
There are minor differences between the two approaches, but the effect
is the same.

In fact, when you're using groups, it's recommended not to use the
individual members in any constraints. You can use the group itself in a
constraint though, to order/colocate the entire group with some other
resource.

> On 08/28/2015 03:39 AM, Rakovec Jost wrote:
>> Hi
>>
>> Ok thanks. I find this on your howto
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch06s08.html
>>
>> so basically I just remove temporary constraint by using
>>
>> crm resource unmove aapche
>>
>> and cluster work as I want.
>>
>> 1.Can you please explain me why is this temporary constraint necessary since 
>> I don't see any benefit, just more work for sysadmin?
> 
> It is created when you do "crm resource move".
> 
> The cluster itself has no concept of "moving" resources; it figures out
> the best place to put each resource, adjusting continuously for
> configuration changes, failures, etc.
> 
> So how tools like crm implement "move" is to change the configuration,
> by adding the temporary constraint. That tells the cluster "this
> resource should be on that node". The cluster adjusts its idea of "best"
> and moves the resource to match it.
> 
>> 2.Is this possible to disable some how?
> 
> Sure, "crm resource unmove" :)
> 
> The constraint can't be removed automatically because neither the
> cluster nor the tool knows when you no longer prefer the resource to be
> at the new location. You have to tell it.
> 
> If you have resource-stickiness, you can "unmove" as soon as the move is
> done, and the resource will stay where it is (unless some other
> configuration is stronger than the stickiness). If you don't have
> resource-stickiness, then once you "unmove", the resource may move to
> some other node, as the cluster adjusts its idea of "best".
> 
>> Thanks
>>
>> Jost
>>
>>
>>
>>
>> 
>> From: Ken Gaillot <kgail...@redhat.com>
>> Sent: Thursday, August 27, 2015 4:00 PM
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] resource-stickiness
>>
>> On 08/27/2015 02:42 AM, Rakovec Jost wrote:
>>> Hi
>>>
>>>
>>> it doesn't work as I expected, I change name to:
>>>
>>> location loc

Re: [ClusterLabs] Adding and removing a node dyamically

2015-10-02 Thread Ken Gaillot
On 10/02/2015 05:36 AM, Vijay Partha wrote:
> could someone help me out with this please? i am making use of cman and
> pacemaker. pcs cluster node add/remove  is not working as it throws
> pcsd service is not running on .

pcs relies on pcsd running on all nodes.

Make sure pcs is installed on all nodes, and pcsd is enabled to start at
boot (via service or systemctl depending on which you are using). Then
set a password (same on all nodes) for the hacluster user. Finally, run
"pcs cluster auth " on the machine you want to run pcs from,
and give it the hacluster user/pass.

You may want to review the Clusters From Scratch documentation to see
other common configuration that needs to be done (firewall, SELinux,
hostnames, etc.):

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html


> On Fri, Oct 2, 2015 at 1:17 PM, Vijay Partha 
> wrote:
> 
>> Hi,
>>
>> I would like to add and remove a node dynamically in pacemaker. What
>> commands are to be given for this to be done.
>>
>> Thanking you
>>
>> --
>> With Regards
>> P.Vijay
>>
> 
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Need bash instead of /bin/sh

2015-09-23 Thread Ken Gaillot
On 09/23/2015 08:38 AM, Ulrich Windl wrote:
 Vladislav Bogdanov  schrieb am 23.09.2015 um 15:24 in
> Nachricht <5602a808.1090...@hoster-ok.com>:
>> 23.09.2015 15:42, dan wrote:
>>> ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl:
>>> dan  schrieb am 23.09.2015 um 13:39 in 
>>> Nachricht
 <1443008370.2386.8.ca...@intraphone.com>:
> Hi
>
> As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was
> default in my version of ubuntu, I have now compiled and installed
> corosync 2.3.4 and pacemaker 1.1.12.
>
> And now it works.
>
> Though the file /usr/lib/ocf/resource.d/pacemaker/controld
> does not work as /bin/sh is linked to dash on ubuntu (and I think
> several other Linux variants).
>
> It is line 182:
>  local addr_list=$(cat
> /sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null)

 That looks like plain POSIX shell to me. What part is causing the problem?
>>>
>>> Did a small test:
>>> ---test.sh
>>> controld_start() {
>>>  local addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2)
>> yep, that is a bashism.
>>
>> posix shell denies assignment of local variables in the declaration.
> 
> In times of BASH it's hard to get POSIX shell documentation. The last we had 
> was from HP-UX. But the problem seems to be more $() than assignment it seems.

Good catch, thanks. I'll submit a patch upstream.

>>
>> local addr_list; addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2)
>>
>> should work
>>
>>>  echo $addr_list
>>> }
>>>
>>> controld_start
>>> --
>>>
>>> dash test.sh
>>> test.sh: 2: local: 10.1.1.1: bad variable name
>>>
>>> bash test.sh
>>> AF_INET 10.1.1.1 AF_INET 10.1.1.2
>>>
>>>
>>>  Dan
=

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute

2015-09-24 Thread Ken Gaillot
An update: as of upstream commit 8940fca, the syntax has been tweaked as
Beekhof mentioned. To create a fencing topology on a node attribute, you
would use the following for the same example:

   
  
   

This avoids any additional restrictions or difficulties related to what
characters can be where.

Combined with the existing methods, this means a topology can be
targeted in one of these ways:

  By node name: target="node1"
  By regular expression matching node names: target-pattern="pcmk.*"
  By node attribute: target-attribute="rack" target-value="1"

On 09/09/2015 07:20 AM, Andrew Beekhof wrote:
> 
>> On 9 Sep 2015, at 7:45 pm, Kristoffer Grönlund <kgronl...@suse.com> wrote:
>>
>> Hi,
>>
>> Ken Gaillot <kgail...@redhat.com> writes:
>>
>>> Pacemaker's upstream master branch has a new feature that will be part
>>> of the eventual 1.1.14 release.
>>>
>>> Fencing topology is used when a node requires multiple fencing devices
>>> (in combination or as fallbacks). Currently, topologies must be
>>> specified by node name (or a regular expression matching node names).
>>>
>>> The new feature allows topologies to specified by node attribute.
>>
>> Sounds like a really useful feature. :) I have implemented initial
>> support for this syntax in crmsh,
> 
> word of warning, i’m in the process of changing it to avoid overloading the 
> ‘target’ attribute and exposing quoting issues stemming from people’s use of 
> ‘='
> 
>https://github.com/beekhof/pacemaker/commit/ea4fc1c
> 
> 
> 
>> so this will work fine in the next
>> version of crmsh.
>>
>> Examples of crmsh syntax below:
>>
>>> Previously, if node1 was in rack #1, you'd have to register a fencing
>>> topology by its name, which at the XML level would look like:
>>>
>>>   
>>>  >>  devices="apc01,apc02"/>
>>>   
>>>
>>
>> crm cfg fencing-topology node1: apc01,apc02
>>
>>>
>>> With the new feature, you could instead register a topology for all
>>> hosts that have a node attribute "rack" whose value is "1":
>>>
>>>   
>>>  >>  devices="apc01,apc02"/>
>>>   
>>>
>>
>> crm cfg fencing-topology rack=1: apc01,apc02
>>
>>>
>>> You would assign that attribute to all nodes in that rack, e.g.:
>>>
>>>   crm_attribute --type nodes --node node1 --name rack --update 1
>>>
>>
>> crm node attr node1 set rack 1
>>
>>>
>>> The syntax accepts either '=' or ':' as the separator for the name/value
>>> pair, so target="rack:1" would work in the XML as well.
>>
>> crm cfg fencing-topology rack:1: apc01,apc02
>>
>> (admittedly perhaps not as clean as using '=', but it works)
>>
>> Cheers,
>> Kristoffer


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual Machines with USB Dongle

2015-09-25 Thread Ken Gaillot
On 09/25/2015 01:40 PM, J. Echter wrote:
> Hi,
> 
> what would you do if you have to run a machine which needs a usb dongle
> / usb gsm modem to operate properly.
> 
> If this machine switches to another node, the usb thing doesnt move around.
> 
> Any hint on such a case?
> 
> Thanks
> 
> Juergen

The approaches I'm aware of:

* Buy one for every node. Typically the most expensive approach, but
simplest and most reliable.

* Buy a USB switch with auto-switch capability. These let you connect a
USB device (or devices) to multiple computers. You'd have to test
whether the auto-switch works in your setup (they're typically marketed
for printers). The switch becomes a single point of failure.

* Use a USB-over-Ethernet device. These are generally proprietary and
become a single point of failure. You'd have to test whether the latency
is good enough for your purpose. (In this case, you could instead get a
GSM modem with built-in TCP/IP connectivity, but it would still be a
single point of failure.)

* If the device is for non-essential capabilities, manually move it
after a failover.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-10-05 Thread Ken Gaillot
On 10/05/2015 08:09 AM, Gordon Ross wrote:
> I’m trying to setup a simple DRBD cluster using Ubuntu 14.04 LTS using 
> Pacemaker & Corosync. My problem is getting the resource to startup.
> 
> I’ve setup the DRBD aspect fine. Checking /proc/drbd I can see that my test 
> DRBD device is all synced and OK.
> 
> Following the examples from the “Clusters From Scratch” document, I built the 
> following cluster configuration:
> 
> property \
>   stonith-enabled="false" \
>   no-quorum-policy="stop" \
>   symmetric-cluster="false"
> node ct1
> node ct2
> node ct3 attributes standby="on"
> primitive drbd_disc0 ocf:linbit:drbd \
>   params drbd_resource="disc0"
> primitive drbd_disc0_fs ocf:heartbeat:Filesystem \
>   params fstype="ext4" device="/dev/drbd0" directory="/replicated/disc0"
> ms ms_drbd0 drbd_disc0 \
>   meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max=“1” \
>notify="true" target-role="Master"
> colocation filesystem_with_disc inf: drbd_disc0_fs ms_drbd0:Master
> 
> ct1 & ct2 are the main DRBD servers, with ct3 being a witness server to avoid 
> split-brain problems.
> 
> When I look at the cluster status, I get:
> 
> crm(live)# status
> Last updated: Mon Oct  5 14:04:12 2015
> Last change: Thu Oct  1 17:31:35 2015 via cibadmin on ct2
> Current DC: ct2 (739377523) - partition with quorum
> 3 Nodes configured
> 3 Resources configured
> 
> 
> Node ct3 (739377524): standby
> Online: [ ct1 ct2 ]
> 
> 
> Failed actions:
> drbd_disc0_monitor_0 (node=ct1, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:42:11 2015
> , queued=60ms, exec=0ms
> ): not configured
> drbd_disc0_monitor_0 (node=ct2, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:17:17 2015
> , queued=67ms, exec=0ms
> ): not configured
> drbd_disc0_monitor_0 (node=ct3, call=5, rc=6, status=complete, 
> last-rc-change=Thu Oct  1 16:42:10 2015
> , queued=54ms, exec=0ms
> ): not configured
> 
> What have I done wrong?

The "rc=6" in the failed actions means the resource's Pacemaker
configuration is invalid. (For OCF return codes, see
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
)

The "_monitor_0" means that this was the initial probe that Pacemaker
does before trying to start the resource, to make sure it's not already
running. As an aside, you probably want to add recurring monitors as
well, otherwise Pacemaker won't notice if the resource fails. For
example: op monitor interval="29s" role="Master" op monitor
interval="31s" role="Slave"

As to why the probe is failing, it's hard to tell. Double-check your
configuration to make sure disc0 is the exact DRBD name, Pacemaker can
read the DRBD configuration file, etc. You can also try running the DRBD
resource agent's "status" command manually to see if it prints a more
detailed error message.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help needed getting DRBD cluster working

2015-10-06 Thread Ken Gaillot
On 10/06/2015 09:38 AM, Gordon Ross wrote:
> On 5 Oct 2015, at 15:05, Ken Gaillot <kgail...@redhat.com> wrote:
>>
>> The "rc=6" in the failed actions means the resource's Pacemaker
>> configuration is invalid. (For OCF return codes, see
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes
>> )
>>
>> The "_monitor_0" means that this was the initial probe that Pacemaker
>> does before trying to start the resource, to make sure it's not already
>> running. As an aside, you probably want to add recurring monitors as
>> well, otherwise Pacemaker won't notice if the resource fails. For
>> example: op monitor interval="29s" role="Master" op monitor
>> interval="31s" role="Slave"
>>
>> As to why the probe is failing, it's hard to tell. Double-check your
>> configuration to make sure disc0 is the exact DRBD name, Pacemaker can
>> read the DRBD configuration file, etc. You can also try running the DRBD
>> resource agent's "status" command manually to see if it prints a more
>> detailed error message.
> 
> I cleated the CIB and re-created most of it with your suggested parameters. 
> It now looks like:
> 
> node $id="739377522" ct1
> node $id="739377523" ct2
> node $id="739377524" ct3 \
>   attributes standby="on"
> primitive drbd_disc0 ocf:linbit:drbd \
>   params drbd_resource="disc0" \
>   meta target-role="Started" \
>   op monitor interval="19s" on-fail="restart" role="Master" 
> start-delay="10s" timeout="20s" \
>   op monitor interval="20s" on-fail="restart" role="Slave" 
> start-delay="10s" timeout="20s"
> ms ms_drbd0 drbd_disc0 \
>   meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true" target-role="Started"

You want to omit target-role, or set it to "Master". Otherwise both
nodes will start as slaves.

> location cli-prefer-drbd_disc0 ms_drbd0 inf: ct2
> location cli-prefer-ms_drbd0 ms_drbd0 inf: ct2

You've given the above constraints different names, but they are
identical: they both say ms_drbd0 can run on ct2 only.

When you're using clone/ms resources, you generally only ever need to
refer to the clone's name, not the resource being cloned. So you don't
need any constraints for drbd_disc0.

You've set symmetric-cluster=false in the cluster options, which means
that Pacemaker will not start resources on any node unless a location
constaint enables it. Here, you're only enabling ct2. Duplicate the
constraint for ct1 (or set symmetric-cluster=true and use a -INF
location constraint for the third node instead).

> property $id="cib-bootstrap-options" \
>   dc-version="1.1.10-42f2063" \
>   cluster-infrastructure="corosync" \
>   stonith-enabled="false" \

I'm sure you've heard this before, but stonith is the only way to avoid
data corruption in a split-brain situation. It's usually best to make
fencing the first priority rather than save it for last, because some
problems can become more difficult to troubleshoot without fencing. DRBD
in particular needs special configuration to coordinate fencing with
Pacemaker: https://drbd.linbit.com/users-guide/s-pacemaker-fencing.html

>   no-quorum-policy="stop" \
>   symmetric-cluster="false"
> 
> 
> I think I’m missing something basic between the DRBD/Pacemaker hook-up.
> 
> As soon as Pacemaker/Corosync start, DRBD on both nodes stop. a “cat 
> /proc/drbd” then just returns:
> 
> version: 8.4.3 (api:1/proto:86-101)
> srcversion: 6551AD2C98F533733BE558C 
> 
> and no details on the replicated disc and the drbd block device disappears.
> 
> GTG
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How I can contribute the code and TR fix for default resouce agent?

2015-12-08 Thread Ken Gaillot
On 12/07/2015 01:13 PM, Xiaohua Wang wrote:
> Hi Friends,
> Since our product is using the Pacemaker and related Resource Agent based on 
> RHEL 6.5.
> We found some bugs and already fixed them. So we want to contribute the code 
> fixing?
> How can we do it ?
> 
> Best Regards
> Xiaohua Wang

Hi,

Thank you for offering to contribute back! It is very much appreciated.

The code repositories are on github under Cluster Labs:

   https://github.com/ClusterLabs

If you have a github account, you can use the above link to select the
repository you want (for example, pacemaker or resource-agents), then
click the "Fork" button at the top right. That will create your own copy
of the repository on github.

You can then clone your new fork to your development machine, make your
changes, and push them back to your fork on github. The github page for
your fork will then show a button to submit a pull request.

If you do not want to use github, you can also email your changes in
standard patch format to the develop...@clusterlabs.org mailing list,
which you can subscribe to here: http://clusterlabs.org/mailman/listinfo/

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Running 'pcs status' cmd on remote node

2015-12-02 Thread Ken Gaillot
On 12/02/2015 06:21 AM, Simon Lawrence wrote:
> 
> In my 2 node test cluster, one node is a physical server (running
> Pacemaker 1.1.13), the other is a VM on that server, configured as a
> Pacemaker remote node (v1.1.13).
> 
> I get the correct output if I run crm_mon & pcs config on the remote
> node, but if I run 'pcs status' I get
> 
> # pcs status
> Cluster name: test
> Error: unable to get list of pacemaker nodes
> 
> 
> Is this normal or should the command work on a remote node?

That's expected. Not all command-line tools are supported when run on
Pacemaker Remote nodes. In this case, "pcs status" is doing "crm_node
-l" which is not yet supported.

The primary design goal was to enable commands known to be used by
resource agents. Enabling all commands is a goal for future versions.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker crash and fencing failure

2015-11-30 Thread Ken Gaillot
On 11/20/2015 06:38 PM, Brian Campbell wrote:
> I've been trying to debug and do a root cause analysis for a cascading
> series of failures that a customer hit a couple of days ago, that
> caused their filesystem to be unavailable for a couple of hours.
> 
> The original failure was in our own distributed filesystem backend, a
> fork of LizardFS, which is in turn a fork of MoosFS This history is
> mostly only important in reading the logs, where "efs", "lizardfs",
> and "mfs" all generally refer to the same services, just different
> generations of naming them as not all daemons, scripts, and packages
> have been renamed.
> 
> There are two master servers that handle metadata operations, running
> Pacemaker to elect which one is the current primary and which one is a
> replica, and a number of chunkservers that store file chunks and
> simply connect to the current running master via a virtual IP. A bug
> in doing a checksum scan on the chunkservers caused them to leak file
> descriptors and become unresponsive, so while the master server was up
> and healthy, no actual filesystem operations could occur. (This bug is
> now fixed by the way, and the fix deployed to the customer, but we
> want to debug why the later failures occurred that caused them to
> continue to have downtime).
> 
> The customer saw that things were unresponsive, and tried doing the
> simplest thing they could to try to resolve it, migrate the services
> to the other master. This succeeded, as the checksum scan had been
> initiated by the first master and so switching over to the replica
> caused all of the extra file descriptors to be closed and the
> chunkservers to become responsive again.
> 
> However, due to one backup service that is not yet managed via
> Pacemaker and thus is only running on the first master, they decided
> to migrate back to the first master. This was when they ran into a
> Pacemaker problem.
> 
> At the time of the problem, es-efs-master1 is the server that was
> originally the master when the first problem happened, and which they
> are trying to migrate the services back to. es-efs-master2 is the one
> actively running the services, and also happens to be the DC at the
> time to that's where to look for pengine messages.
> 
> On master2, you can see the point when the user tried to migrate back
> to master1 based on the pengine decisions:
> 
> (by the way, apologies for the long message with large log excerpts; I
> was trying to balance enough detail with not overwhelming, it can be
> hard to keep it short when explaining these kinds of complicated
> failures across a number of machines)

Indeed. In a case as complex as this, more is better -- a copy of the
configuration and the output of crm_report would be helpful (with any
sensitive info removed).

> Nov 18 08:28:28 es-efs-master2 pengine[1923]:  warning: unpack_rsc_op:
> Forcing editshare.stack.7c645b0e-
> 46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive:1 to stop after
> a failed demote action

Any idea why the demote failed?

> Nov 18 08:28:28 es-efs-master2 pengine[1923]:   notice: LogActions:
> Moveeditshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.ip#011(Started
> es-efs-master2 -> es-efs-master1)
> Nov 18 08:28:28 es-efs-master2 pengine[1923]:   notice: LogActions:
> Promote 
> editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive:0#011(Slave
> -> Master es-efs-master1)
> Nov 18 08:28:28 es-efs-master2 pengine[1923]:   notice: LogActions:
> Demote  
> editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive:1#011(Master
> -> Slave es-efs-master2)
> Nov 18 08:28:28 es-efs-master2 pengine[1923]:   notice:
> process_pe_message: Calculated Transition 1481601:
> /var/lib/pacemaker/pengine/pe-input-1355.bz2
> Nov 18 08:28:28 es-efs-master2 stonith-ng[1920]:  warning:
> cib_process_diff: Diff 0.2754083.1 -> 0.2754083.2 from local not
> applied to 0.2754083.1: Failed application of an update diff

Normally failed diffs shouldn't cause any problems, other than extra
network/disk traffic for a full CIB sync.

There have been many bugs fixed since 1.1.10, and I think some of them
relate to cib diffs. Since you're doing everything custom, you might as
well use latest upstream libqb/corosync/pacemaker releases (btw current
pacemaker master branch has been stable and will soon become the basis
of 1.1.14rc1).

Have you been able to reproduce the issue on a test cluster? That would
be important to investigating further. You could set PCMK_debug=yes (I'm
guessing that would be in /etc/default/pacemaker on ubuntu) to get more
logs.

> Nov 18 08:28:28 es-efs-master2 crmd[1924]:   notice:
> process_lrm_event: LRM operation
> editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_notify_0
> (call=400, rc=0, cib-update=0, confirmed=true) ok
> Nov 18 08:28:28 es-efs-master2 crmd[1924]:   notice: run_graph:
> Transition 1481601 (Complete=5, Pending=0, Fired=0, Skipped=15,
> Incomplete=10, 

Re: [ClusterLabs] Resources suddenly get target-role="stopped"

2015-12-04 Thread Ken Gaillot
On 12/04/2015 10:22 AM, Klechomir wrote:
> Hi list,
> My issue is the following:
> 
> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8
> (observed the same problem with Corosync 2.3.5  & Pacemaker 1.1.13-rc3)
> 
> Bumped on this issue when started playing with VirtualDomain resources,
> but this seems to be unrelated to the RA.
> 
> The problem is that without apparent reason a resource gets
> target-role="Stopped". This happens after (successful) migration, or
> after failover., or after VM restart .
> 
> My tests showed that changing the resource name fixes this problem, but
> this seems to be a temporary workaround.
> 
> The resource configuration is:
> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \
> params config="/NFSvolumes/CDrive1/VM1/VM1.xml"
> hypervisor="qemu:///system" migration_transport="tcp" \
> meta allow-migrate="true" target-role="Started" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> op monitor interval="10" timeout="30" depth="0" \
> utilization cpu="1" hv_memory="925"
> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1
> 
> Here is the log from one such stop, after successful migration with "crm
> migrate resource VMA_VM1":
> 
> Dec 04 15:18:22 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564)
> Dec 04 15:18:22 [4434] CLUSTER-1   lrmd: info:
> cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1
> Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled
> Dec 04 15:18:23 [3818929] CLUSTER-1   crmd:debug:
> do_lrm_rsc_op:Performing
> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG:
> Virtual domain VM1 is currently running.
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1:
> Starting live migration to CLUSTER-2 (using virsh
> --connect=qemu:///system --quiet migrate --live  VM1
> qemu+tcp://CLUSTER-2/system ).
> Dec 04 15:18:24 [3818929] CLUSTER-1   crmd: info:
> process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564,
> status=1, cib-update=0, confirmed=false) Cancelled
> Dec 04 15:18:24 [3818929] CLUSTER-1   crmd:debug:
> update_history_cache: Updating history for 'VMA_VM1' with
> monitor op
> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1:
> live migration to CLUSTER-2 succeeded.
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:debug:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 - exited with rc=0
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 [
> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2
> (using virsh --connect=qemu:///system --quiet migrate --live  VM1
> qemu+tcp://CLUSTER-2/system ). ]
> Dec 04 15:18:26 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_migrate_to_0:1797698 [
> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ]
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug:
> create_operation_update:  do_update_resource: Updating resouce
> VMA_VM1 after complete migrate_to op (interval=0)
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:   notice:
> process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697,
> rc=0, cib-update=89, confirmed=true) ok
> Dec 04 15:18:27 [3818929] CLUSTER-1   crmd:debug:
> update_history_cache: Updating history for 'VMA_VM1' with
> migrate_to op
> Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug: cancel_op:   
> Operation VMA_VM1:5564 already cancelled
> Dec 04 15:18:31 [3818929] CLUSTER-1   crmd:debug:
> do_lrm_rsc_op:Performing
> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0
> VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG:
> Virtual domain VM1 is not running:  failed to get domain 'vm1' domain
> not found: no domain with matching name 'vm1'

This looks like the problem. Configuration error?

> VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 INFO: Domain
> VM1 already stopped.
> Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:debug:
> operation_finished:  VMA_VM1_stop_0:1798719 - exited with rc=0
> Dec 04 15:18:31 [4434] CLUSTER-1   lrmd:   notice:
> operation_finished:  VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31
> INFO: Domain VM1 already stopped. ]
> Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:debug:
> create_operation_update:  do_update_resource: Updating resouce
> VMA_VM1 after complete stop op (interval=0)
> Dec 04 15:18:32 [3818929] CLUSTER-1   crmd:   notice:
> process_lrm_event:LRM operation VMA_VM1_stop_0 (call=5701, rc=0,
> cib-update=90, confirmed=true) ok
> Dec 04 15:18:32 

[ClusterLabs] Pacemaker 1.1.14 - Release Candidate (try it out!)

2015-12-08 Thread Ken Gaillot
The release cycle for Pacemaker 1.1.14 has begun! The source code for a
release candidate is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.14-rc2

This release candidate introduces some valuable new features:

* Resources will now start as soon as their state has been confirmed on
all nodes and all dependencies have been satisfied, rather than waiting
for the state of all resources to be confirmed. This allows for faster
startup of some services, and more even startup load.

* Fencing topology levels can now be applied to all nodes whose name
matches a configurable pattern, or that have a configurable node attribute.

* When a fencing topology level has multiple devices, reboots are now
automatically mapped to all-off-then-all-on, allowing much simplified
configuration of redundant power supplies.

* Guest nodes can now be included in groups, which simplifies the common
Pacemaker Remote use case of a grouping a storage device, filesystem and VM.

* Clone resources have a new clone-min metadata option, specifying that
a certain number of instances must be running before any dependent
resources can run. This is particularly useful for services behind a
virtual IP and haproxy, as is often done with OpenStack.

As usual, the release includes many bugfixes and minor enhancements. For
a more detailed list of changes, see the change log:

https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog

Everyone is encouraged to download, compile and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

(You may notice we're starting with rc2; rc1 was released, but had a
compilation issue in some cases.)
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-03 Thread Ken Gaillot
On 12/03/2015 05:23 AM, Nikhil Utane wrote:
> Ken,
> 
> One more question, if i have to propagate configuration changes between the
> nodes then is cpg (closed process group) the right way?
> For e.g.
> Active Node1 has config A=1, B=2
> Active Node2 has config A=3, B=4
> Standby Node needs to have configuration for all the nodes such that
> whichever goes down, it comes up with those values.
> Here configuration is not static but can be updated at run-time.

Being unfamiliar with the specifics of your case, I can't say what the
best approach is, but it sounds like you will need to write a custom OCF
resource agent to manage your service.

A resource agent is similar to an init script:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

The RA will start the service with the appropriate configuration. It can
use per-resource options configured in pacemaker or external information
to do that.

How does your service get its configuration currently?

> BTW, I'm little confused between OpenAIS and Corosync. For my purpose I
> should be able to use either, right?

Corosync started out as a subset of OpenAIS, optimized for use with
Pacemaker. Corosync 2 is now the preferred membership layer for
Pacemaker for most uses, though other layers are still supported.

> Thanks.
> 
> On Tue, Dec 1, 2015 at 9:04 PM, Ken Gaillot <kgail...@redhat.com> wrote:
> 
>> On 12/01/2015 05:31 AM, Nikhil Utane wrote:
>>> Hi,
>>>
>>> I am evaluating whether it is feasible to use Pacemaker + Corosync to add
>>> support for clustering/redundancy into our product.
>>
>> Most definitely
>>
>>> Our objectives:
>>> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.
>>
>> You can do this with location constraints and scores. See:
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on
>>
>> Basically, you give the standby node a lower score than the other nodes.
>>
>>> 2) Each node has some different configuration parameters.
>>> 3) Whenever any active node goes down, the standby node comes up with the
>>> same configuration that the active had.
>>
>> How you solve this requirement depends on the specifics of your
>> situation. Ideally, you can use OCF resource agents that take the
>> configuration location as a parameter. You may have to write your own,
>> if none is available for your services.
>>
>>> 4) There is no one single process/service for which we need redundancy,
>>> rather it is the entire system (multiple processes running together).
>>
>> This is trivially implemented using either groups or ordering and
>> colocation constraints.
>>
>> Order constraint = start service A before starting service B (and stop
>> in reverse order)
>>
>> Colocation constraint = keep services A and B on the same node
>>
>> Group = shortcut to specify several services that need to start/stop in
>> order and be kept together
>>
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392
>>
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources
>>
>>
>>> 5) I would also want to be notified when any active<->standby state
>>> transition happens as I would want to take some steps at the application
>>> level.
>>
>> There are multiple approaches.
>>
>> If you don't mind compiling your own packages, the latest master branch
>> (which will be part of the upcoming 1.1.14 release) has built-in
>> notification capability. See:
>> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
>>
>> Otherwise, you can use SNMP or e-mail if your packages were compiled
>> with those options, or you can use the ocf:pacemaker:ClusterMon resource
>> agent:
>>
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928
>>
>>> I went through the documents/blogs but all had example for 1 active and 1
>>> standby use-case and that too for some standard service like httpd.
>>
>> Pacemaker is incredibly versatile, and the use cases are far too varied
>> to cover more than a small subset. Those simple examples show the basic
>> building blocks, and can usually point you to the specific features you
>> need to investigate further.
>>
>>> One additional question, If I am having multiple actives, then Virtual IP
>>&

Re: [ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Ken Gaillot
On 12/10/2015 01:14 PM, Louis Munro wrote:
> I can now answer parts of my own question.
> 
> 
> My config was missing the quorum configuration:
> 
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> two_node: 1
> expected_votes: 2
> }
> 
> 
> I read the manpage as saying that was optional, but it looks like I may be 
> misreading here.
> corosync.conf(5) says the following: 
> 
> Within the quorum directive it is possible to specify the quorum algorithm to 
> use with the
> provider directive. At the time of writing only corosync_votequorum is 
> supported.  
> See votequorum(5) for configuration options.
> 
> 
> 
> I still have messages in the logs saying 
> crmd:   notice: get_node_name:   Defaulting to uname -n for the local 
> corosync node name
> 
> I am not sure which part of the configuration I should be setting for that.
> 
> Any pointers regarding that would be nice.

Hi,

As long as the unames are what you want the nodes to be called, that
message is fine. You can explicitly set the node names by using a
nodelist {} section in corosync.conf, with each node {} having a
ring0_addr specifying the name.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Ken Gaillot
On 12/10/2015 12:45 PM, Louis Munro wrote:
> Hello all,
> 
> I am trying to get a Corosync 2 cluster going on CentOS 6.7 but I am running 
> in a bit of a problem with either Corosync or Pacemaker.
> crm reports that all my nodes are offline and the stack is unknown (I am not 
> sure if that is relevant).
> 
> I believe both nodes are actually present and seen in corosync, but they may 
> not be considered as such by pacemaker.
> I have messages in the logs saying that the processes cannot get the node 
> name and default to uname -n: 
> 
> Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: 
> corosync_node_name:Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com   crmd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> 
> The uname -n is correct as far that is concerned.
> 
> 
> Does this mean anything to anyone here? 
> 
> 
> [Lots of details to follow]...
> 
> I compiled my own versions of Corosync, Pacemaker, crm and the 
> resource-agents seemingly without problems.
> 
> Here is what I currently have installed:
> 
> # corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> # pacemakerd -F
> Pacemaker 1.1.13 (Build: 5b41ae1)
>  Supporting v3.0.10:  generated-manpages agent-manpages ascii-docs ncurses 
> libqb-logging libqb-ipc lha-fencing upstart nagios  corosync-native 
> atomic-attrd libesmtp acls
> 
> # crm --version
> crm 2.2.0-rc3
> 
> 
> 
> Here is the output of crm status:
> 
> # crm status
> Last updated: Thu Dec 10 12:47:50 2015Last change: Thu Dec 10 
> 12:02:33 2015 by root via cibadmin on hack1.example.com
> Stack: unknown
> Current DC: NONE
> 2 nodes and 0 resources configured
> 
> OFFLINE: [ hack1.example.com hack2.example.com ]
> 
> Full list of resources:
> 
> {nothing to see here}
> 
> 
> 
> # corosync-cmapctl | grep members
> runtime.totem.pg.mrp.srp.members.739513528.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513528.ip (str) = r(0) ip(172.20.20.184)
> runtime.totem.pg.mrp.srp.members.739513528.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513528.status (str) = joined
> runtime.totem.pg.mrp.srp.members.739513590.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513590.ip (str) = r(0) ip(172.20.20.246)
> runtime.totem.pg.mrp.srp.members.739513590.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513590.status (str) = joined
> 
> 
> # uname -n
> hack1.example.com
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513528
> RING ID 0
>   id  = 172.20.20.184
>   status  = ring 0 active with no faults
> 
> 
> # uname -n
> hack2.example.com
> 
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513590
> RING ID 0
>   id  = 172.20.20.246
>   status  = ring 0 active with no faults
> 
> 
> 
> 
> Shouldn’t I see both nodes in the same ring?

They are in the same ring, but the cfgtool will only print the local id.

> My corosync config is currently defined as:
> 
> # egrep -v '#' /etc/corosync/corosync.conf
> totem {
>   version: 2
> 
>   crypto_cipher: none
>   crypto_hash: none
>   clear_node_high_bit: yes
>   cluster_name: hack_cluster
>   interface {
>   ringnumber: 0
>   bindnetaddr: 172.20.0.0
>   mcastaddr: 239.255.1.1
>   mcastport: 5405
>   ttl: 1
>   }
> 
> }
> 
> logging {
>   fileline: on
>   to_stderr: no
>   to_logfile: yes
>   logfile: /var/log/cluster/corosync.log
>   to_syslog: yes
>   debug: off
>   timestamp: on
>   logger_subsys {
>   subsys: QUORUM
>   debug: off
>   }
> }
> 
> # cat /etc/corosync/service.d/pacemaker
> service {
> name: pacemaker
> ver: 1
> }

You don't want this section if you're using corosync 2. That's the old
"plugin" used with corosync 1.

> 
> 
> And here is my pacemaker configuration:
> 
> # crm config show xml
> 
>  crm_feature_set="3.0.10" validate-with="pacemaker-2.4" 
> update-client="cibadmin" epoch="13" admin_epoch="0" update-user="root" 
> cib-last-written="Thu Dec 10 13:35:06 2015">
>   
> 
>   
>  id="cib-bootstrap-options-stonith-enabled"/>
>  id="cib-bootstrap-options-no-quorum-policy"/>
>   
> 
> 
>   
> 
>id="hack1.example.com-instance_attributes-standby"/>
> 
>   
>   
> 
>id="hack2.example.com-instance_attributes-standby"/>
> 
>   
> 
> 
> 
>   
> 
> 
> 
> 
> 
> 
> 
> And finally some logs that might be relevant: 
> 
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] 
> main.c:1227 Corosync Cluster Engine ('2.3.5'): started and ready to provide 
> service.

Re: [ClusterLabs] Early VM resource migration

2015-12-16 Thread Ken Gaillot
On 12/16/2015 10:30 AM, Klechomir wrote:
> On 16.12.2015 17:52, Ken Gaillot wrote:
>> On 12/16/2015 02:09 AM, Klechomir wrote:
>>> Hi list,
>>> I have a cluster with VM resources on a cloned active-active storage.
>>>
>>> VirtualDomain resource migrates properly during failover (node standby),
>>> but tries to migrate back too early, during failback, ignoring the
>>> "order" constraint, telling it to start when the cloned storage is up.
>>> This causes unnecessary VM restart.
>>>
>>> Is there any way to make it wait, until its storage resource is up?
>> Hi Klecho,
>>
>> If you have an order constraint, the cluster will not try to start the
>> VM until the storage resource agent returns success for its start. If
>> the storage isn't fully up at that point, then the agent is faulty, and
>> should be modified to wait until the storage is truly available before
>> returning success.
>>
>> If you post all your constraints, I can look for anything that might
>> affect the behavior.
> Thanks for the reply, Ken
> 
> Seems to me that that the constraints for a cloned resources act a a bit
> different.
> 
> Here is my config:
> 
> primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
> params device="/dev/CSD_CDrive1/AA_CDrive1"
> directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
> primitive VM_VM1 ocf:heartbeat:VirtualDomain \
> params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
> hypervisor="qemu:///system" migration_transport="tcp" \
> meta allow-migrate="true" target-role="Started"
> clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
> meta interleave="true" resource-stickiness="0"
> target-role="Started"
> order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 VM_VM1
> 
> Every time when a node comes back from standby, the VM tries to live
> migrate to it long before the filesystem is up.

In most cases (including this one), when you have an order constraint,
you also need a colocation constraint.

colocation = two resources must be run on the same node

order = one resource must be started/stopped/whatever before another

Or you could use a group, which is essentially a shortcut for specifying
colocation and order constraints for any sequence of resources.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker documentation license clarification

2015-12-14 Thread Ken Gaillot
On 12/13/2015 06:56 PM, Ferenc Wagner wrote:
> Ken Gaillot <kgail...@redhat.com> writes:
> 
>> On 12/11/2015 10:07 AM, Ferenc Wagner wrote:
>>
>>> [...] the "Legal Notice"
>>> section of the generated Publican documentation (for example
>>> Pacemaker_Explained/desktop/en-US/index.html) says that the material may
>>> only be distributed under GFDL-1.2+.
>>
>> This is an artifact of how you're building the documentation. Easy to
>> miss given the makefile complexity :)
> 
> Especially that I did not study the doc makefiles at all, just issued
> make.
> 
>> If you look at the generated versions on clusterlabs.org, they have the
>> correct license (CC-BY-SA)
> 
> That's great.
> 
>> However if you do not "make brand" before building the documentation,
>> you will get the publican defaults.
> 
> Wouldn't --with-brand=clusterlabs also be needed?  Anyway, I can't
> really do this because of the sudo step.  But specifying --brand_dir
> helps indeed.  Is there any reason not to use the clusterlabs brand
> automatically all the time, without installation?  It goes like this:

Currently, the brand is specified in each book's publican.cfg (which is
generated by configure, and can be edited by "make www-cli"). It works,
so realistically it's a low priority to improve it, given everything
else on the plate.

You're welcome to submit a pull request to change it to use the local
brand directory. Be sure to consider that each book comes in multiple
formats (and potentially translations, though they're out of date at
this point, which is a whole separate discussion worth raising at some
point), and add anything generated to .gitignore.

> --- a/doc/Makefile.am
> +++ b/doc/Makefile.am
> @@ -73,16 +73,20 @@ EXTRA_DIST= $(docbook:%=%.xml)
>  %.html: %.txt
>   $(AM_V_ASCII)$(ASCIIDOC) --unsafe --backend=xhtml11 $<
>  
> +# publican-clusterlabs/xsl/html-single.xsl imports that of Publican
> +# through this link during the build
> +../xsl:
> + ln -s /usr/share/publican/xsl "$@"
>  
>  CFS_TXT=$(wildcard Clusters_from_Scratch/en-US/*.txt)
>  CFS_XML=$(CFS_TXT:%.txt=%.xml)
>  
>  # We have to hardcode the book name
>  # With '%' the test for 'newness' fails
> -Clusters_from_Scratch.build: $(PNGS) $(wildcard 
> Clusters_from_Scratch/en-US/*.xml) $(CFS_XML)
> +Clusters_from_Scratch.build: $(PNGS) $(wildcard 
> Clusters_from_Scratch/en-US/*.xml) $(CFS_XML) ../xsl
>   $(PCMK_V) @echo Building $(@:%.build=%) because of $?
>   rm -rf $(@:%.build=%)/publish/*
> - $(AM_V_PUB)cd $(@:%.build=%) && RPM_BUILD_DIR="" $(PUBLICAN) build 
> --publish --langs=$(DOCBOOK_LANGS) --formats=$(DOCBOOK_FORMATS) $(PCMK_quiet)
> + $(AM_V_PUB)cd $(@:%.build=%) && RPM_BUILD_DIR="" $(PUBLICAN) build 
> --publish --langs=$(DOCBOOK_LANGS) --formats=$(DOCBOOK_FORMATS) $(PCMK_quiet) 
> --brand_dir=../publican-clusterlabs
>   rm -rf $(@:%.build=%)/tmp
>   touch $@
> 
> [...]
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] successful ipmi stonith still times out

2015-12-17 Thread Ken Gaillot
On 12/17/2015 10:32 AM, Ron Kerry wrote:
> I have a customer (running SLE 11 SP4 HAE) who is seeing the following
> stonith behavior running the ipmi stonith plugin.
> 
> Dec 15 14:21:43 test4 pengine[24002]:  warning: pe_fence_node: Node
> test3 will be fenced because termination was requested
> Dec 15 14:21:43 test4 pengine[24002]:  warning: determine_online_status:
> Node test3 is unclean
> Dec 15 14:21:43 test4 pengine[24002]:  warning: stage6: Scheduling Node
> test3 for STONITH
> 
> ... it issues the reset and it is noted ...
> Dec 15 14:21:45 test4 external/ipmi(STONITH-test3)[177184]: [177197]:
> debug: ipmitool output: Chassis Power Control: Reset
> Dec 15 14:21:46 test4 stonith-ng[23999]:   notice: log_operation:
> Operation 'reboot' [177179] (call 2 from crmd.24003) for host 'test3'
> with device 'STONITH-test3' returned: 0 (OK)
> 
> ... test3 does go down ...
> Dec 15 14:22:21 test4 kernel: [90153.906461] Cell 2 (test3) left the
> membership
> 
> ... but the stonith operation times out (it said OK earlier) ...
> Dec 15 14:22:56 test4 stonith-ng[23999]:   notice: remote_op_timeout:
> Action reboot (a399a8cb-541a-455e-8d7c-9072d48667d1) for test3
> (crmd.24003) timed out
> Dec 15 14:23:05 test4 external/ipmi(STONITH-test3)[177667]: [177678]:
> debug: ipmitool output: Chassis Power is on
> 
> Dec 15 14:23:56 test4 crmd[24003]:error:
> stonith_async_timeout_handler: Async call 2 timed out after 132000ms
> Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback:
> Stonith operation 2/51:100:0:f43dc87c-faf0-4034-8b51-be0c13c95656: Timer
> expired (-62)
> Dec 15 14:23:56 test4 crmd[24003]:   notice: tengine_stonith_callback:
> Stonith operation 2 for test3 failed (Timer expired): aborting transition.
> Dec 15 14:23:56 test4 crmd[24003]:   notice: abort_transition_graph:
> Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
> 
> This looks like a bug but a quick search did not turn up anything. Does
> anyone recognize this problem?

Fence timeouts can be tricky to troubleshoot because there are multiple
timeouts involved. The process goes like this:

1. crmd asks the local stonithd to do the fence.

2. The local stonithd queries all stonithd's to ensure it has the latest
status of all fence devices.

3. The local stonithd chooses a fence device (or possibly devices, if
topology is involved) and picks the best stonithd (or stonithd's) to
actually execute the fencing.

4. The chosen stonithd (or stonithd's) runs the fence agent to do the
actual fencing, then replies to the original stonithd, which replies to
the original requester.

So the crmd can timeout waiting for a reply from stonithd, the local
stonithd can timeout waiting for query replies from all stonithd's, the
local stonithd can timeout waiting for a reply from one or more
executing stonithd's, or an executing stonithd can timeout waiting for a
reply from the fence device.

Another factor is that some reboots can be remapped to off then on. This
will happen, for example, if the fence device doesn't have a reboot
command, or if it's in a fence topology level with other devices. So in
that case, there's the possibility of a timeout for the off command, and
the on command.

In this case, one thing that's odd is that the "Async call 2 timed out"
message is the timeout for the crmd waiting for a reply from stonithd.
The crmd timeout is always a minute longer than stonithd's timeout,
which should be more than enough time for stonithd to reply. I'm not
sure what's going on there.

I'd look closely at the entire fence configuration (is topology
involved? what are the configured timeouts? are the configuration
options correct?), and trace through the logs to see what step or steps
are actually timing out.

I do see here that the reboot times out before the "Chassis Power is on"
message, so it's possible the reboot timeout is too short to account for
a full cycle. But I'm not sure why it would report OK before that,
unless maybe that was for one step of the larger process.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Early VM resource migration

2015-12-16 Thread Ken Gaillot
On 12/16/2015 02:09 AM, Klechomir wrote:
> Hi list,
> I have a cluster with VM resources on a cloned active-active storage.
> 
> VirtualDomain resource migrates properly during failover (node standby),
> but tries to migrate back too early, during failback, ignoring the
> "order" constraint, telling it to start when the cloned storage is up.
> This causes unnecessary VM restart.
> 
> Is there any way to make it wait, until its storage resource is up?

Hi Klecho,

If you have an order constraint, the cluster will not try to start the
VM until the storage resource agent returns success for its start. If
the storage isn't fully up at that point, then the agent is faulty, and
should be modified to wait until the storage is truly available before
returning success.

If you post all your constraints, I can look for anything that might
affect the behavior.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Pacemaker] Beginner | Resources stuck unloading

2015-12-16 Thread Ken Gaillot
On 12/14/2015 12:18 AM, Tyler Hampton wrote:
> Hi!
> 
> I'm currently trying to semi-follow Sebastien Han's blog post on
> implementing HA with Ceph rbd volumes and I am hitting some walls. The
> difference between what I'm trying to do and the blog post is that I'm
> trying to implement an active/passive instead of an active/active.
> 
> I am able to get the two nodes to recognize each other and for a single
> node to assume resources. However, the setup is fairly finnicky (I'm
> assuming due to my ignorance) and I can't get it to work most of the time.
> 
> When I do get a pair and try to fail over (service pacemaker stop) the node
> that I'm stopping pacemaker on fails to unload its controlled resources and
> goes into a loop. A 'proper' failover has only happened twice.
> 
> pacemaker stop output (with log output):
> https://gist.github.com/howdoicomputer/d88e224f6fead4623efc
> 
> resource configuration:
> https://gist.github.com/howdoicomputer/a6f846eb54c3024a5be9
> 
> Any help is greatly appreciated.

Hopefully someone with more ceph or upstart experience can give you more
specifics.

But generally, stonith-enabled=false can lead to error recovery problems
and make trouble harder to diagnose. If you can take the time to get
stonith working, it should at least stop your first problem from causing
further problems.

If you're using corosync 2, you can set "two_node: 1" in corosync.conf,
and delete the no-quorum-policy=ignore setting in Pacemaker. It won't
make a huge difference, but corosync 2 can handle it better now.

If you are doing a planned failover, a better way would be to put the
node into standby mode first, then stop pacemaker. That ensures all
resources are successfully failed over first, and when the node comes
back, it lets you decide when it's ready to host resources again (by
taking it out of standby mode), which gives you time for
administration/troubleshooting/whatever reason you took it down.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.14 - Release Candidate 3

2015-12-14 Thread Ken Gaillot
The source code for the latest Pacemaker release candidate is available
at
https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.14-rc3

This is a bugfix release:

* When deleting an attribute from a fence device, the entire device
would sometimes be deleted.

* 0f9a4eb0 introduced a regression preventing crm_mon from being run as
a daemon.

Everyone is encouraged to download, compile and test the new release.
Your feedback is important and appreciated.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker documentation license clarification

2015-12-11 Thread Ken Gaillot
On 12/11/2015 10:07 AM, Ferenc Wagner wrote:
> Hi,
> 
> We're packaging Pacemaker for Debian and this requires a clear picture
> of all licenses relevant to the package.  The software part is clearly
> under GPL-2+ and LGPL-2+, which is fine.  However, the "Legal Notice"
> section of the generated Publican documentation (for example
> Pacemaker_Explained/desktop/en-US/index.html) says that the material may
> only be distributed under GFDL-1.2+.  Since there are no invariant
> sections, this would be workable (if inconvenient), but I suspect that
> this difference might be unintended, just the implicit Publican default
> surfacing in the lack of explicit configuration.
> 
> Could the authors (preferably the primary author, Andrew Beekhof) please
> give a definitive statement about the intended license of the
> Clusters_from_Scratch, Pacemaker_Explained and Pacemaker_Remote books?

This is an artifact of how you're building the documentation. Easy to
miss given the makefile complexity :)

If you look at the generated versions on clusterlabs.org, they have the
correct license (CC-BY-SA), e.g.:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html

However if you do not "make brand" before building the documentation,
you will get the publican defaults.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2016-01-04 Thread Ken Gaillot
On 01/04/2016 09:25 AM, Bogdan Dobrelya wrote:
> On 04.01.2016 15:50, Bogdan Dobrelya wrote:
>> So far so bad.
>> I made a dummy OCF script [0] to simulate an example
>> promote/demote/notify failure mode for a multistate clone resource which
>> is very similar to the one I reported originally. And the test to
>> reproduce my case with the dummy is:
>> - install dummy resource ocf ra and create the dummy resource as README
>> [0] says
>> - just watch the a) OCF logs from the dummy and b) outputs for the
>> reoccurring commands:
>>
>> # while true; do date; ls /var/lib/heartbeat/trace_ra/dummy/ | tail -1;
>> sleep 20; done&
>> # crm_resource --resource p_dummy --list-operations
>>
>> At some point I noticed:
>> - there are no more "OK" messages logged from the monitor actions,
>> although according to the trace_ra dumps' timestamps, all monitors are
>> still being invoked!
>>
>> - at some point I noticed very strange results reported by the:
>> # crm_resource --resource p_dummy --list-operations
>> p_dummy (ocf::dummy:dummy): FAILED : p_dummy_monitor_103000
>> (node=node-1.test.domain.local, call=579, rc=1, last-rc-change=Mon Jan
>> 4 14:33:07 2016, exec=62107ms): Timed Out
>>   or
>> p_dummy (ocf::dummy:dummy): Started : p_dummy_monitor_103000
>> (node=node-3.test.domain.local, call=-1, rc=1, last-rc-change=Mon Jan  4
>> 14:43:58 2016, exec=0ms): Timed Out
>>
>> - according to the trace_ra dumps reoccurring monitors are being invoked
>> by the intervals *much longer* than configured. For example, a 7 minutes
>> of "monitoring silence":
>> Mon Jan  4 14:47:46 UTC 2016
>> p_dummy.monitor.2016-01-04.14:40:52
>> Mon Jan  4 14:48:06 UTC 2016
>> p_dummy.monitor.2016-01-04.14:47:58
>>
>> Given that said, it is very likely there is some bug exist for
>> monitoring multi-state clones in pacemaker!
>>
>> [0] https://github.com/bogdando/dummy-ocf-ra
>>
> 
> Also note, that lrmd spawns *many* monitors like:
> root  6495  0.0  0.0  70268  1456 ?Ss2015   4:56  \_
> /usr/lib/pacemaker/lrmd
> root 31815  0.0  0.0   4440   780 ?S15:08   0:00  |   \_
> /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
> root 31908  0.0  0.0   4440   388 ?S15:08   0:00  |
>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
> root 31910  0.0  0.0   4440   384 ?S15:08   0:00  |
>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
> root 31915  0.0  0.0   4440   392 ?S15:08   0:00  |
>   \_ /bin/sh /usr/lib/ocf/resource.d/dummy/dummy monitor
> ...

At first glance, that looks like your monitor action is calling itself
recursively, but I don't see how in your code.

> At some point, there was  already. Then I unmanaged the p_dummy but
> it grew up to the 2403 after that. The number of running monitors may
> grow or decrease as well.
> Also, the /var/lib/heartbeat/trace_ra/dummy/ still have been populated
> by new p_dummy.monitor* files with recent timestamps. Why?..
> 
> If I pkill -9 all dummy monitors, lrmd spawns another ~2000 almost
> instantly :) Unless the node became unresponsive at some point. And
> after restarted by power off:
> # crm_resource --resource p_dummy --list-operations
> p_dummy (ocf::dummy:dummy): Started (unmanaged) :
> p_dummy_monitor_3 (node=node-1.test.domain.local, call=679, rc=1,
> last-rc-change=Mon Jan  4 15:04:25 2016, exec=66747ms): Timed Out
> or
> p_dummy (ocf::dummy:dummy): Stopped (unmanaged) :
> p_dummy_monitor_103000 (node=node-3.test.domain.local, call=142, rc=1,
> last-rc-change=Mon Jan  4 15:14:59 2016, exec=65237ms): Timed Out
> 
> And then lrmd repeats all of the fun again.
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 cluster-ip restarts on all nodes after failover

2016-01-06 Thread Ken Gaillot
On 01/06/2016 02:40 PM, Joakim Hansson wrote:
> Hi list!
> I'm running a 3-node vm-cluster in which all the nodes run Tomcat (Solr)
> from the same disk using GFS2.
> On top of this I use IPaddr2-clone for cluster-ip and loadbalancing between
> all the nodes.
> 
> Everything works fine, except when i perform a failover on one node.
> When node01 shuts down, node02 takes over it's ipaddr-clone. So far so good.
> The thing is, when I fire up node01 again all the ipaddr-clones on all
> nodes restarts and thereby messes up Tomcat.

You want interleave=true on your clones.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_options

> Here is my configuration:
> 
> Cluster Name: GFS2-cluster
> Corosync Nodes:
>  node01 node02 node03
> Pacemaker Nodes:
>  node01 node02 node03
> 
> Resources:
>  Clone: dlm-clone
>   Meta Attrs: clone-max=3 clone-node-max=1
>   Resource: dlm (class=ocf provider=pacemaker type=controld)
>Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
>stop interval=0s timeout=100 (dlm-stop-timeout-100)
>monitor interval=60s (dlm-monitor-interval-60s)
>  Clone: GFS2-clone
>   Meta Attrs: clone-max=3 clone-node-max=1 globally-unique=true
>   Resource: GFS2 (class=ocf provider=heartbeat type=Filesystem)
>Attributes: device=/dev/sdb directory=/home/solr fstype=gfs2
>Operations: start interval=0s timeout=60 (GFS2-start-timeout-60)
>stop interval=0s timeout=60 (GFS2-stop-timeout-60)
>monitor interval=20 timeout=40 (GFS2-monitor-interval-20)
>  Clone: ClusterIP-clone
>   Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=true
>   Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=192.168.100.200 cidr_netmask=32 clusterip_hash=sourceip
>Meta Attrs: resource-stickiness=0
>Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
>stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
>monitor interval=30s (ClusterIP-monitor-interval-30s)
>  Clone: Tomcat-clone
>   Meta Attrs: clone-max=3 clone-node-max=1
>   Resource: Tomcat (class=systemd type=tomcat)
>Operations: monitor interval=60s (Tomcat-monitor-interval-60s)
> 
> Stonith Devices:
>  Resource: fence-vmware (class=stonith type=fence_vmware_soap)
>   Attributes:
> pcmk_host_map=node01:4212a559-8e66-2882-e7fe-96e2bd86bfdb;node02:4212150e-2d2d-dc3e-ee16-2eb280db2ec7;node03:42126708-bd46-adc5-75cb-678cdbcc06be
> pcmk_host_check=static-list login=USERNAME passwd=PASSWORD action=reboot
> ssl_insecure=true ipaddr=IP-ADDRESS
>   Operations: monitor interval=60s (fence-vmware-monitor-interval-60s)
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start dlm-clone then start GFS2-clone (kind:Mandatory)
> (id:order-dlm-clone-GFS2-clone-mandatory)
>   start GFS2-clone then start Tomcat-clone (kind:Mandatory)
> (id:order-GFS2-clone-Tomcat-clone-mandatory)
>   start Tomcat-clone then start ClusterIP-clone (kind:Mandatory)
> (id:order-Tomcat-clone-ClusterIP-clone-mandatory)
>   stop ClusterIP-clone then stop Tomcat-clone (kind:Mandatory)
> (id:order-ClusterIP-clone-Tomcat-clone-mandatory)
>   stop Tomcat-clone then stop GFS2-clone (kind:Mandatory)
> (id:order-Tomcat-clone-GFS2-clone-mandatory)
> Colocation Constraints:
>   GFS2-clone with dlm-clone (score:INFINITY)
> (id:colocation-GFS2-clone-dlm-clone-INFINITY)
>   GFS2-clone with Tomcat-clone (score:INFINITY)
> (id:colocation-GFS2-clone-Tomcat-clone-INFINITY)
> 
> Resources Defaults:
>  resource-stickiness: 100
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: GFS2-cluster
>  dc-version: 1.1.13-10.el7-44eb2dd
>  enabled: false
>  have-watchdog: false
>  last-lrm-refresh: 1450177886
>  stonith-enabled: true
> 
> 
> Any help is greatly appreciated.
> 
> Thanks in advance
> /Jocke

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Q] Pacemaker: Kamailio resource agent

2016-01-08 Thread Ken Gaillot
On 12/26/2015 05:27 AM, Sebish wrote:
> Hello to all ha users,
> 
> first of all thanks for you work @ mailinglist, pacemaker and ras!
> 
> I have an issue with the kamailio resource agent
> 
> (ra) and it would be great, if you could help me a little.

I'm not familiar with kamailio, but I can make some general comments ...

> -- 
> _Status:
> 
> _Debian 7.9
> Kamailio - running
> Heartbeat & Pacemaker - running (incl. running virtual IP and apache ra)
> and more
> 
> _What I did__:_
> 
>  * Create /usr/lib/ocf/resource.d/heartbeat/kamailio and chmod 755'd it
>  * Then I inserted the code of the ra and changed the following:
>  o RESKEY_kamuser_default="*myuser*"

It's not necessary to change the defaults in the code; when you create
the resource configuration in the cluster, you can specify options (such
as "kamuser=*myuser*") to override the defaults.

>  o Line 52 to:
>RESKEY_pidfile_default="/var/run/kamailio/kamailio.pid(This is
>in my kamctlrc file too, exists and works)
>  o Line 53 to: RESKEY_monitoring_ip_default=*IPOFMYKAMAILIOSERVER*
>  o Changed : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} ->
>/usr/lib/ocf/lib/heartbeat , because he did not find it

This shouldn't be necessary; pacemaker should set the OCF_ROOT
environment variable before calling the agent. If you were having
trouble testing it from the command line, simply set
OCF_ROOT=/usr/lib/ocf before calling it.

>  o Changed html snippet  to &&

I'm not sure what you mean here. The example given in the agent's XML
metadata should stay as  since it's XML and may not parse
correctly otherwise. If you're talking about your kamailio.cfg, then
yes, you should use && there.

>  o listen_address:*virtualipofkamailioserver*
>  o (For more see attachment)
> 
>  * Installed sipsak
> 
> _What I get:_
> 
> crm status gives me: STOPPED - Kamailio_start_0 (node=node1, call=22,
> rc=-2, status=Timed Out): unknown exec error (on all nodes)

This means that pacemaker tried to call the "start" action of the
resource agent, but it timed out on every node. It's possible the start
action isn't working, or that the timeout is too short. You can set the
timeout by defining a start operation for the resource in the cluster
configuration, with a timeout= option.

> _
> What I need:_
> 
>  * In the ra at line 155 it says to insert a code snippet to the
>kamailio.cfg, but not where exactly.
>  o Please tell me, at which spot exactly I have to insert it. (I
>pasted it at line ~582, # Handle requests within SIP dialogs)
> 
>  * Is there a way to debug the kamailio ra, if inserting the code
>snipped using your help will not be enough?

Any output from resource agents should be in the system log and/or
pacemaker.log. That's a good place to start.

There are also tools such as ocf-tester and ocft to test resource agents
from the command line (though they're not always made available in
packages).

> 
> Thank you very mich for your time and interest!
> 
> Sebastian


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2016-01-08 Thread Ken Gaillot
On 01/08/2016 11:13 AM, Nikhil Utane wrote:
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
> 
> Not sure I understand this. Stickiness will ensure that resources don't
> move back when original node comes back up, isn't it?
> But in my case, I want the newly standby node to become the backup node for
> all other nodes. i.e. it should now be able to run all my resource groups
> albeit with a lower score. How do I achieve that?

Oh right. I forgot to ask whether you had an opt-out
(symmetric-cluster=true, the default) or opt-in
(symmetric-cluster=false) cluster. If you're opt-out, every node can run
every resource unless you give it a negative preference.

Partly it depends on whether there is a good reason to give each
instance a "home" node. Often, there's not. If you just want to balance
resources across nodes, the cluster will do that by default.

If you prefer to put certain resources on certain nodes because the
resources require more physical resources (RAM/CPU/whatever), you can
set node attributes for that and use rules to set node preferences.

Either way, you can decide whether you want stickiness with it.

> Also can you answer, how to get the values of node that goes active and the
> node that goes down inside the OCF agent?  Do I need to use notification or
> some simpler alternative is available?
> Thanks.
> 
> 
> On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot <kgail...@redhat.com> wrote:
> 
>> On 01/08/2016 06:55 AM, Nikhil Utane wrote:
>>> Would like to validate my final config.
>>>
>>> As I mentioned earlier, I will be having (upto) 5 active servers and 1
>>> standby server.
>>> The standby server should take up the role of active that went down. Each
>>> active has some unique configuration that needs to be preserved.
>>>
>>> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
>>> resource (for virtual IP) and my custom resource.
>>> 2) The virtual IP needs to be read inside my custom OCF agent, so I will
>>> make use of attribute reference and point to the value of IPaddr2 inside
>> my
>>> custom resource to avoid duplication.
>>> 3) I will then configure location constraint to run the group resource
>> on 5
>>> active nodes with higher score and lesser score on standby.
>>> For e.g.
>>> Group  NodeScore
>>> -
>>> MyGroup1node1   500
>>> MyGroup1node6   0
>>>
>>> MyGroup2node2   500
>>> MyGroup2node6   0
>>> ..
>>> MyGroup5node5   500
>>> MyGroup5node6   0
>>>
>>> 4) Now if say node1 were to go down, then stop action on node1 will first
>>> get called. Haven't decided if I need to do anything specific here.
>>
>> To clarify, if node1 goes down intentionally (e.g. standby or stop),
>> then all resources on it will be stopped first. But if node1 becomes
>> unavailable (e.g. crash or communication outage), it will get fenced.
>>
>>> 5) But when the start action of node 6 gets called, then using crm
>> command
>>> line interface, I will modify the above config to swap node 1 and node 6.
>>> i.e.
>>> MyGroup1node6   500
>>> MyGroup1node1   0
>>>
>>> MyGroup2node2   500
>>> MyGroup2node1   0
>>>
>>> 6) To do the above, I need the newly active and newly standby node names
>> to
>>> be passed to my start action. What's the best way to get this information
>>> inside my OCF agent?
>>
>> Modifying the configuration from within an agent is dangerous -- too
>> much potential for feedback loops between pacemaker and the agent.
>>
>> I think stickiness will do what you want here. Set a stickiness higher
>> than the original node's preference, and the resource will want to stay
>> where it is.
>>
>>> 7) Apart from node name, there will be other information which I plan to
>>> pass by making use of node attributes. What's the best way to get this
>>> information inside my OCF agent? Use crm command to query?
>>
>> Any of the command-line interfaces for doing so should be fine, but I'd
>> recommend using one of the lower-level tools (crm_attribute or
>> attrd_updater) so you don't have a dependency on a higher-level tool
>> that may not always be

Re: [ClusterLabs] [Q] Check on application layer (kamailio, openhab)

2015-12-21 Thread Ken Gaillot
On 12/19/2015 10:21 AM, Sebish wrote:
> Dear all ha-list members,
> 
> I am trying to setup two availability checks on application layer using
> heartbeat and pacemaker.
> To be more concrete I need 1 resource agent (ra) for openHAB and 1 for
> Kamailio SIP Proxy.
> 
> *My setup:
> *
> 
>+ Debian 7.9 + Heartbeat + Pacemaker + more

This should work for your purposes, but FYI, corosync 2 is the preferred
communications layer these days. Debian 7 provides corosync 1, which
might be worth using here, to make an eventual switch to corosync 2 easier.

Also FYI, Pacemaker was dropped from Debian 8, but there is a group
working on backporting the latest pacemaker/corosync/etc. to it.

>+ 2 Node Cluster with Hot-Standby Failover
>+ Active Cluster with clusterip, ip-monitoring, working failover and
>services
>+ Copied kamailio ra into /usr/lib/ocf/resource.d/heartbeat, chmod
>755 and 'crm ra list ocf heartbeat' finds it
> 
> *The plan:*
> 
> _openHAB_
> 
>My idea was to let heartbeat check for the availabilty of openHAB's
>website (jettybased) or check if the the process is up and running.
> 
>I did not find a fitting resource agent. Is there a general ra in
>which you would just have to insert the process name 'openhab'?
> 
> _Kamailio_
> 
>My idea was to let an ra send a SIP-request to kamailio and check,
>if it gets an answer AND if it is the correct one.
> 
>It seems like the ra
>   
> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/kamailio
> 
>does exactly what I want,
>but I do not really understand it. Is it plug and play? Do I have to
>change values inside the code like users, the complete meta-data or
>else?
> 
>When I try to insert this agent (no changes) into pacemaker using
>'crm configure primitive kamailio ocf:heartbeat:kamailio' it says:
> 
>lrmadmin[4629]: 2015/12/19_16:11:40 ERROR:
>lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a
>reply message of rmetadata with function get_ret_from_msg.
>ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
>ERROR: ocf:heartbeat:kamailio: could not parse meta-data:
>ERROR: ocf:heartbeat:kamailio: no such resource agent

lrmadmin is no longer used, and I'm not familiar with it, but first I'd
check that the RA is executable. If it supports running directly from
the command line, maybe make sure you can run it that way first.

Most RAs support configuration options, which you can set in the cluster
configuration (you don't have to edit the RA). Each RA specifies the
options it accepts in the  section of its metadata.

> *The question:*_
> 
> _Maybe you could give me some hints on what to do next. Perhaps one of
> you is even already using the kamailio ra successfully or checking a
> non-apache website?
> If I simply have to insert all my cluster data into the kamailio ra, it
> should not throw this error, should it? Could have used a readme for
> this ra though...
> If you need any data, I will provide it asap!
> 
> *
> **Thanks a lot to all who read this mail!*
> 
> Sebish
> ha-newbie, but not noobie ;)


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Anyone successfully install Pacemaker/Corosync on Freebsd?

2015-12-21 Thread Ken Gaillot
On 12/19/2015 04:56 PM, mike wrote:
> Hi All,
> 
> just curious if anyone has had any luck at one point installing
> Pacemaker and Corosync on FreeBSD. I have to install from source of
> course and I've run into an issue when running ./configure while trying
> to install Corosync. The process craps out at nss with this error:

FYI, Ruben Kerkhof has done some recent work to get the FreeBSD build
working. It will go into the next 1.1.14 release candidate. In the
meantime, make sure you have the very latest code from upstream's 1.1
branch.

> checking for nss... configure: error: in `/root/heartbeat/corosync-2.3.3':
> configure: error: The pkg-config script could not be found or is too
> old. Make sure it
> is in your PATH or set the PKG_CONFIG environment variable to the full
> path to pkg-config.​
> Alternatively, you may set the environment variables nss_CFLAGS
> and nss_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
> 
> I've looked unsuccessfully for a package called pkg-config and nss
> appears to be installed as you can see from this output:
> 
> root@wellesley:~/heartbeat/corosync-2.3.3 # pkg install nss
> Updating FreeBSD repository catalogue...
> FreeBSD repository is up-to-date.
> All repositories are up-to-date.
> Checking integrity... done (0 conflicting)
> The most recent version of packages are already installed
> 
> Anyway - just looking for any suggestions. Hoping that perhaps someone
> has successfully done this.
> 
> thanks in advance
> -mgb


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] master/slave resource agent without demote

2015-11-30 Thread Ken Gaillot
On 11/25/2015 10:57 PM, Waldemar Brodkorb wrote:
> Hi,
> Andrei Borzenkov wrote,
> 
>> On Tue, Nov 24, 2015 at 5:19 PM, Waldemar Brodkorb
>>  wrote:
>>> Hi,
>>>
>>> we are using a derivate of the Tomcat OCF script.
>>> Our web application needs to be promoted (via a wget call).
>>> But our application is not able to demote in a clean way, so
>>> we need to stop and then start the tomcat applicationserver
>>> to get into slave mode.
>>>
>>> What is the best way to handle this?
>>>
>>
>> Not sure I understand the question. If your application has to be
>> restarted on demote, you restart it on demote in your RA. Or do I
>> misunderstand your question?
> 
> Yes, at the moment we stop it first with the stop function and
> then execute the RA with the start parameter in the background
> returning OCF_SUCCESS. Then there are some
> stamp files containing the current time in unix seconds to prevent
> another start while asynchonely demoting.
>  
> I am experimenting right now with just using the stop function.
> It works at least for three failover scenarios:
> - poweroff the master
> - reboot the master
> - crm node standby / crm node online the master

I'd recommend that the agent do a synchronous stop then start for
demote. Once demote returns 0 (success), the service should be in a
state such that any subsequent monitor will also return 0.

If you implemented stop only, then the next monitor should report 7 (not
running), which would be considered a failure. Similarly, if start is
async, the next monitor might run before the start is done.

> Next I need to test migrate, I think the reason for the complex
> demote fake was a problem with migrate.
> 
> best regards


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Help required for N+1 redundancy setup

2015-12-01 Thread Ken Gaillot
On 12/01/2015 05:31 AM, Nikhil Utane wrote:
> Hi,
> 
> I am evaluating whether it is feasible to use Pacemaker + Corosync to add
> support for clustering/redundancy into our product.

Most definitely

> Our objectives:
> 1) Support N+1 redundancy. i,e. N Active and (up to) 1 Standby.

You can do this with location constraints and scores. See:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_deciding_which_nodes_a_resource_can_run_on

Basically, you give the standby node a lower score than the other nodes.

> 2) Each node has some different configuration parameters.
> 3) Whenever any active node goes down, the standby node comes up with the
> same configuration that the active had.

How you solve this requirement depends on the specifics of your
situation. Ideally, you can use OCF resource agents that take the
configuration location as a parameter. You may have to write your own,
if none is available for your services.

> 4) There is no one single process/service for which we need redundancy,
> rather it is the entire system (multiple processes running together).

This is trivially implemented using either groups or ordering and
colocation constraints.

Order constraint = start service A before starting service B (and stop
in reverse order)

Colocation constraint = keep services A and B on the same node

Group = shortcut to specify several services that need to start/stop in
order and be kept together

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231363875392

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#group-resources


> 5) I would also want to be notified when any active<->standby state
> transition happens as I would want to take some steps at the application
> level.

There are multiple approaches.

If you don't mind compiling your own packages, the latest master branch
(which will be part of the upcoming 1.1.14 release) has built-in
notification capability. See:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Otherwise, you can use SNMP or e-mail if your packages were compiled
with those options, or you can use the ocf:pacemaker:ClusterMon resource
agent:
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm231308442928

> I went through the documents/blogs but all had example for 1 active and 1
> standby use-case and that too for some standard service like httpd.

Pacemaker is incredibly versatile, and the use cases are far too varied
to cover more than a small subset. Those simple examples show the basic
building blocks, and can usually point you to the specific features you
need to investigate further.

> One additional question, If I am having multiple actives, then Virtual IP
> configuration cannot be used? Is it possible such that N actives have
> different IP addresses but whenever standby becomes active it uses the IP
> address of the failed node?

Yes, there are a few approaches here, too.

The simplest is to assign a virtual IP to each active, and include it in
your group of resources. The whole group will fail over to the standby
node if the original goes down.

If you want a single virtual IP that is used by all your actives, one
alternative is to clone the ocf:heartbeat:IPaddr2 resource. When cloned,
that resource agent will use iptables' CLUSTERIP functionality, which
relies on multicast Ethernet addresses (not to be confused with
multicast IP). Since multicast Ethernet has limitations, this is not
often used in production.

A more complicated method is to use a virtual IP in combination with a
load-balancer such as haproxy. Pacemaker can manage haproxy and the real
services, and haproxy manages distributing requests to the real services.

> Thanking in advance.
> Nikhil

A last word of advice: Fencing (aka STONITH) is important for proper
recovery from difficult failure conditions. Without it, it is possible
to have data loss or corruption in a split-brain situation.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start service after filesystemressource

2015-11-20 Thread Ken Gaillot
On 11/20/2015 07:38 AM, haseni...@gmx.de wrote:
> Hi,
> I want to start several services after the drbd ressource an the filessystem 
> is 
> avaiable. This is my current configuration:
> node $id="184548773" host-1 \
>  attributes standby="on"
> node $id="184548774" host-2 \
>  attributes standby="on"
> primitive collectd lsb:collectd \
>  op monitor interval="10" timeout="30" \
>  op start interval="0" timeout="120" \
>  op stop interval="0" timeout="120"
> primitive failover-ip1 ocf:heartbeat:IPaddr \
>  params ip="192.168.6.6" nic="eth0:0" cidr_netmask="32" \
>  op monitor interval="10s"
> primitive failover-ip2 ocf:heartbeat:IPaddr \
>  params ip="192.168.6.7" nic="eth0:1" cidr_netmask="32" \
>  op monitor interval="10s"
> primitive failover-ip3 ocf:heartbeat:IPaddr \
>  params ip="192.168.6.8" nic="eth0:2" cidr_netmask="32" \
>  op monitor interval="10s"
> primitive res_drbd_export ocf:linbit:drbd \
>  params drbd_resource="hermes"
> primitive res_fs ocf:heartbeat:Filesystem \
>  params device="/dev/drbd0" directory="/mnt" fstype="ext4"
> group mygroup failover-ip1 failover-ip2 failover-ip3 collectd
> ms ms_drbd_export res_drbd_export \
>  meta notify="true" master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1"
> location cli-prefer-collectd collectd inf: host-1
> location cli-prefer-failover-ip1 failover-ip1 inf: host-1
> location cli-prefer-failover-ip2 failover-ip2 inf: host-1
> location cli-prefer-failover-ip3 failover-ip3 inf: host-1
> location cli-prefer-res_drbd_export res_drbd_export inf: hermes-1
> location cli-prefer-res_fs res_fs inf: host-1

A word of warning, these "cli-" constraints were added automatically
when you ran CLI commands to move resources to specific hosts. You have
to clear these when you're done with whatever the move was for,
otherwise the resources will only run on those nodes from now on.

If you're using pcs, "pcs resource clear " will do it.

> colocation c_export_on_drbd inf: mygroup res_fs ms_drbd_export:Master
> order o_drbd_before_services inf: ms_drbd_export:promote res_fs:start
> property $id="cib-bootstrap-options" \
>  dc-version="1.1.10-42f2063" \
>  cluster-infrastructure="corosync" \
>  stonith-enabled="false" \
>  no-quorum-policy="ignore" \
>  last-lrm-refresh="1447686090"
> #vim:set syntax=pcmk
> I don't found the right way, to order the startup of new services (example 
> collectd), after the /mnt is mounted. Can you help me?

As other posters mentioned, order constraints and/or groups will do
that. Exact syntax depends on what CLI tools you use, check their man
pages for details.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Wait until resource is really ready before moving clusterip

2016-01-12 Thread Ken Gaillot
On 01/12/2016 07:57 AM, Kristoffer Grönlund wrote:
> Joakim Hansson  writes:
> 
>> Hi!
>> I have a cluster running tomcat which in turn run solr.
>> I use three nodes with loadbalancing via ipaddr2.
>> The thing is, when tomcat is started on a node it takes about 2 minutes
>> before solr is functioning correctly.
>>
>> Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is
>> started before it moves the ip to the node?
>>
>> Much appreciated!
> 
> Hi,
> 
> There is the ocf:heartbeat:Delay resource agent, which on one hand is
> documented as a test resource, but on the other hand should do what you
> need:
> 
> primitive solr ...
> primitive two-minute-delay ocf:heartbeat:Delay \
>   params startdelay=120 meta target-role=Started \
>   op start timeout=180
> group solr-then-wait solr two-minute-delay
> 
> Now the group acts basically like the solr resource, except for the
> two-minute delay after starting solr before the group itself is
> considered started.
> 
> Cheers,
> Kristoffer
> 
>>
>> / Jocke

Another way would be to customize the tomcat resource agent so that
start doesn't return success until it's fully ready to accept requests
(which would probably be specific to whatever app you're running via
tomcat). Of course you'd need a long start timeout.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 10:11 AM, Dmitri Maziuk wrote:
> On 2016-06-08 09:11, Ken Gaillot wrote:
>> On 06/08/2016 03:26 AM, Jan Pokorný wrote:
> 
>>> Pacemaker can drive systemd-managed services for quite some time.
>>
>> This is as easy as changing lsb:dovecot to systemd:dovecot.
> 
> Great! Any chance that could be mentioned on
> http://www.linux-ha.org/wiki/Resource_agents -- hint, hint ;)
> 
> Thanks guys,
> Dima

There's a big box at the top of every page on that wiki :)

"Looking for current and maintained information and documentation on
(Linux ) Open Source High Availability HA Clustering? You probably
should be reading the Pacemaker site clusterlabs.org. This site
conserves Heartbeat specific stuff."

The current documentation is:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-supported

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Different pacemaker versions split cluster

2016-06-08 Thread Ken Gaillot
On 06/07/2016 02:26 PM, DacioMF wrote:
> Ken,
> 
> I clear all logs in /var/log/corosync and reboot the cluster (this is the 
> test environment, but i want to upgrade the production).
> 
> I attach the output of the command crm_report --from "2016-06-07 0:0:0" after 
> the reboot.
> 
> The corosync and pacemaker versions on Ubuntu 16.04 is 2.3.5 and 1.1.14
> 
> The corosync and pacemaker versions on Ubuntu 14.04 is 2.3.3 and 1.1.10
> 
> 
>  DacioMF Analista de Redes e Infraestrutura

This isn't causing your issue, but when running a mixed-version cluster,
it's essential that a node running the oldest version is elected DC. You
can ensure that by always booting and starting the cluster on it first.
See http://blog.clusterlabs.org/blog/2013/mixing-pacemaker-versions

In this case, we're not getting that far, because the nodes aren't
talking to each other.

The corosync.quorum output shows that everything's fine at the cluster
membership level. This can also be seen in the live CIB where
in_ccm="true" for all nodes (indicating membership), but crmd="offline"
for the different-version nodes (indicating broken pacemaker communication).

In the logs, we can see "state is now member" for all four nodes, but
pcmk_cpg_membership only sees the nodes with the same version.

I suspect the problem is in corosync's cpg handling, since
pcmk_cpg_membership logs everything it gets from corosync. I'm not
familiar with any relevant changes between 2.3.3 and 2.3.5, so I'm not
sure what's going wrong.

> 
> 
> Em Segunda-feira, 6 de Junho de 2016 17:30, Ken Gaillot <kgail...@redhat.com> 
> escreveu:
> On 05/30/2016 01:14 PM, DacioMF wrote:
>> Hi,
>>
>> I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked 
>> well. I need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my 
>> resources. Two nodes have been updated to 16.04 and the two others remains 
>> with 14.04. The problem is that my cluster was splited and the nodes with 
>> Ubuntu 14.04 only work with the other in the same version. The same is true 
>> for the nodes with Ubuntu 16.04. The feature set of pacemaker in Ubuntu 
>> 14.04 is v3.0.7 and in 16.04 is v3.0.10.
>>
>> The following commands shows what's happening:
>>
>> root@xenserver50:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:19:06 2016
>> Last change: Thu May 19 09:00:48 2016 via cibadmin on xenserver50
>> Stack: corosync
>> Current DC: xenserver51 (51) - partition with quorum
>> Version: 1.1.10-42f2063
>> 4 Nodes configured
>> 4 Resources configured
>>
>> Online: [ xenserver50 xenserver51 ]
>> OFFLINE: [ xenserver52 xenserver54 ]
>>
>> -
>>
>> root@xenserver52:/var/log/corosync# crm status
>> Last updated: Thu May 19 17:20:04 2016Last change: Thu May 19 
>> 08:54:57 2016 by hacluster via crmd on xenserver54
>> Stack: corosync
>> Current DC: xenserver52 (version 1.1.14-70404b0) - partition with quorum
>> 4 nodes and 4 resources configured
>>
>> Online: [ xenserver52 xenserver54 ]
>> OFFLINE: [ xenserver50 xenserver51 ]
>>
>> xenserver52 and xenserver54 are Ubuntu 16.04 the others are Ubuntu 14.04.
>>
>> Someone knows what's the problem?
>>
>> Sorry by my poor english.
>>
>> Best regards,
>>  DacioMF Analista de Redes e Infraestrutura
> 
> 
> Hi,
> 
> We aim for backward compatibility, so this likely is a bug. Can you
> attach the output of crm_report from around this time?
> 
>   crm_report --from "-M-D H:M:S" --to "-M-D H:M:S"
> 
> FYI, you cannot do a rolling upgrade from corosync 1 to corosync 2, but
> I believe both 14.04 and 16.04 use corosync 2.
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>>> A recent thread discussed a proposed new feature, a new environment
>>>> variable that would be passed to resource agents, indicating whether a
>>>> stop action was part of a recovery.
>>>>
>>>> Since that thread was long and covered a lot of topics, I'm starting a
>>>> new one to focus on the core issue remaining:
>>>>
>>>> The original idea was to pass the number of restarts remaining before
>>>> the resource will no longer tried to be started on the same node. This
>>>> involves calculating (fail-count - migration-threshold), and that
>>>> implies certain limitations: (1) it will only be set when the cluster
>>>> checks migration-threshold; (2) it will only be set for the failed
>>>> resource itself, not for other resources that may be recovered due to
>>>> dependencies on it.
>>>>
>>>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>>>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>>>> new variable like OCF_RESKEY_CRM_recovery=true
>>>
>>> This concept worries me, especially when what we've implemented is
>>> called OCF_RESKEY_CRM_restarting.
>>
>> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
>>
>>> The name alone encourages people to "optimise" the agent to not
>>> actually stop the service "because its just going to start again
>>> shortly".  I know thats not what Adam would do, but not everyone
>>> understands how clusters work.
>>>
>>> There are any number of reasons why a cluster that intends to restart
>>> a service may not do so.  In such a scenario, a badly written agent
>>> would cause the cluster to mistakenly believe that the service is
>>> stopped - allowing it to start elsewhere.
>>>
>>> Its true there are any number of ways to write bad agents, but I would
>>> argue that we shouldn't be nudging people in that direction :)
>>
>> I do have mixed feelings about that. I think if we name it
>> start_expected, and document it carefully, we can avoid any casual mistakes.
>>
>> My main question is how useful would it actually be in the proposed use
>> cases. Considering the possibility that the expected start might never
>> happen (or fail), can an RA really do anything different if
>> start_expected=true?
> 
> I would have thought not.  Correctness should trump optimal.
> But I'm prepared to be mistaken.
> 
>> If the use case is there, I have no problem with
>> adding it, but I want to make sure it's worthwhile.

Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens, etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:
> 06.06.2016 19:39, Ken Gaillot wrote:
>> On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
>>> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com>
>>> wrote:
>>>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>>>>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com>
>>>>> wrote:
>>>>>> A recent thread discussed a proposed new feature, a new environment
>>>>>> variable that would be passed to resource agents, indicating
>>>>>> whether a
>>>>>> stop action was part of a recovery.
>>>>>>
>>>>>> Since that thread was long and covered a lot of topics, I'm
>>>>>> starting a
>>>>>> new one to focus on the core issue remaining:
>>>>>>
>>>>>> The original idea was to pass the number of restarts remaining before
>>>>>> the resource will no longer tried to be started on the same node.
>>>>>> This
>>>>>> involves calculating (fail-count - migration-threshold), and that
>>>>>> implies certain limitations: (1) it will only be set when the cluster
>>>>>> checks migration-threshold; (2) it will only be set for the failed
>>>>>> resource itself, not for other resources that may be recovered due to
>>>>>> dependencies on it.
>>>>>>
>>>>>> Ulrich Windl proposed an alternative: setting a boolean value
>>>>>> instead. I
>>>>>> forgot to cc the list on my reply, so I'll summarize now: We would
>>>>>> set a
>>>>>> new variable like OCF_RESKEY_CRM_recovery=true
>>>>>
>>>>> This concept worries me, especially when what we've implemented is
>>>>> called OCF_RESKEY_CRM_restarting.
>>>>
>>>> Agreed; I plan to rename it yet again, to
>>>> OCF_RESKEY_CRM_start_expected.
>>>>
>>>>> The name alone encourages people to "optimise" the agent to not
>>>>> actually stop the service "because its just going to start again
>>>>> shortly".  I know thats not what Adam would do, but not everyone
>>>>> understands how clusters work.
>>>>>
>>>>> There are any number of reasons why a cluster that intends to restart
>>>>> a service may not do so.  In such a scenario, a badly written agent
>>>>> would cause the cluster to mistakenly believe that the service is
>>>>> stopped - allowing it to start elsewhere.
>>>>>
>>>>> Its true there are any number of ways to write bad agents, but I would
>>>>> argue that we shouldn't be nudging people in that direction :)
>>>>
>>>> I do have mixed feelings about that. I think if we name it
>>>> start_expected, and document it carefully, we can avoid any casual
>>>> mistakes.
>>>>
>>>> My main question is how useful would it actually be in the proposed use
>>>> cases. Considering the possibility that the expected start might never
>>>> happen (or fail), can an RA really do anything different if
>>>> start_expected=true?
>>>
>>> I would have thought not.  Correctness should trump optimal.
>>> But I'm prepared to be mistaken.
>>>
>>>> If the use case is there, I have no problem with
>>>> adding it, but I want to make sure it's worthwhile.
>>
>> Anyone have comments on this?
>>
>> A simple example: pacemaker calls an RA stop with start_expected=true,
>> then before the start happens, someone disables the resource, so the
>> start is never called. Or the node is fenced before the start happens,
>> etc.
>>
>> Is there anything significant an RA can do differently based on
>> start_expected=true/false without causing problems if an expected start
>> never happens?
> 
> Yep.
> 
> It may request stop of other resources
> * on that node by removing some node attributes which participate in
> location constraints
> * or cluster-wide by revoking/putting to standby cluster ticket other
> resources depend on
> 
> Latter case is that's why I asked about the possibility of passing the
> node name resource is intended to be started on instead of a boolean
> value (in comments to PR #1026) - I would use it to request stop of
> lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
> lustre component which does all "request routing") fails to start
> anywhere in cluster. That way, if RA does not receive any node name,

Why would ordering constraints be insufficient?

What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?

> then it can be "almost sure" pacemaker does not intend to restart
> resource (yet) and can request it to stop everything else (because
> filesystem is not usable anyways). Later, if another start attempt
> (caused by failure-timeout expiration) succeeds, RA may grant the ticket
> back, and all other resources start again.
> 
> Best,
> Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker reload Master/Slave resource

2016-06-06 Thread Ken Gaillot
On 05/20/2016 06:20 AM, Felix Zachlod (Lists) wrote:
> version 1.1.13-10.el7_2.2-44eb2dd
> 
> Hello!
> 
> I am currently developing a master/slave resource agent. So far it is working 
> just fine, but this resource agent implements reload() and this does not work 
> as expected when running as Master:
> The reload action is invoked and it succeeds returning 0. The resource is 
> still Master and monitor will return $OCF_RUNNING_MASTER.
> 
> But Pacemaker considers the instance being slave afterwards. Actually only 
> reload is invoked, no monitor, no demote etc.
> 
> I first thought that reload should possibly return $OCF_RUNNING_MASTER too 
> but this leads to the resource failing on reload. It seems 0 is the only 
> valid return code.
> 
> I can recover the cluster state running resource $resourcename promote, which 
> will call
> 
> notify
> promote
> notify
> 
> Afterwards my resource is considered Master again. After  PEngine Recheck 
> Timer (I_PE_CALC) just popped (90ms), the cluster manager will promote 
> the resource itself.
> But this can lead to unexpected results, it could promote the resource on the 
> wrong node so that both sides are actually running master, the cluster will 
> not even notice it does not call monitor either.
> 
> Is this a bug?
> 
> regards, Felix

I think it depends on your point of view :)

Reload is implemented as an alternative to stop-then-start. For m/s
clones, start leaves the resource in slave state.

So on the one hand, it makes sense that Pacemaker would expect a m/s
reload to end up in slave state, regardless of the initial state, since
it should be equivalent to stop-then-start.

On the other hand, you could argue that a reload for a master should
logically be an alternative to demote-stop-start-promote.

On the third hand ;) you could argue that reload is ambiguous for master
resources and thus shouldn't be supported at all.

Feel free to open a feature request at http://bugs.clusterlabs.org/ to
say how you think it should work.

As an aside, I think the current implementation of reload in pacemaker
is unsatisfactory for two reasons:

* Using the "unique" attribute to determine whether a parameter is
reloadable was a bad idea. For example, the location of a daemon binary
is generally set to unique=0, which is sensible in that multiple RA
instances can use the same binary, but a reload could not handle that
change. It is not a problem only because no one ever changes that.

* There is a fundamental misunderstanding between pacemaker and most RA
developers as to what reload means. Pacemaker uses the reload action to
make parameter changes in the resource's *pacemaker* configuration take
effect, but RA developers tend to use it to reload the service's own
configuration files (a more natural interpretation, but completely
different from how pacemaker uses it).

> trace   May 20 12:58:31 cib_create_op(609):0: Sending call options: 0010, 
> 1048576
> trace   May 20 12:58:31 cib_native_perform_op_delegate(384):0: Sending 
> cib_modify message to CIB service (timeout=120s)
> trace   May 20 12:58:31 crm_ipc_send(1175):0: Sending from client: cib_shm 
> request id: 745 bytes: 1070 timeout:12 msg...
> trace   May 20 12:58:31 crm_ipc_send(1188):0: Message sent, not waiting for 
> reply to 745 from cib_shm to 1070 bytes...
> trace   May 20 12:58:31 cib_native_perform_op_delegate(395):0: Reply: No data 
> to dump as XML
> trace   May 20 12:58:31 cib_native_perform_op_delegate(398):0: Async call, 
> returning 268
> trace   May 20 12:58:31 do_update_resource(2274):0: Sent resource state 
> update message: 268 for reload=0 on scst_dg_ssd
> trace   May 20 12:58:31 cib_client_register_callback_full(606):0: Adding 
> callback cib_rsc_callback for call 268
> trace   May 20 12:58:31 process_lrm_event(2374):0: Op scst_dg_ssd_reload_0 
> (call=449, stop-id=scst_dg_ssd:449, remaining=3): Confirmed
> notice  May 20 12:58:31 process_lrm_event(2392):0: Operation 
> scst_dg_ssd_reload_0: ok (node=alpha, call=449, rc=0, cib-update=268, 
> confirmed=true)
> debug   May 20 12:58:31 update_history_cache(196):0: Updating history for 
> 'scst_dg_ssd' with reload op
> trace   May 20 12:58:31 crm_ipc_read(992):0: No message from lrmd received: 
> Resource temporarily unavailable
> trace   May 20 12:58:31 mainloop_gio_callback(654):0: Message acquisition 
> from lrmd[0x22b0ec0] failed: No message of desired type (-42)
> trace   May 20 12:58:31 crm_fsa_trigger(293):0: Invoked (queue len: 0)
> trace   May 20 12:58:31 s_crmd_fsa(159):0: FSA invoked with Cause: 
> C_FSA_INTERNAL   State: S_NOT_DC
> trace   May 20 12:58:31 s_crmd_fsa(246):0: Exiting the FSA
> trace   May 20 12:58:31 crm_fsa_trigger(295):0: Exited  (queue len: 0)
> trace   May 20 12:58:31 crm_ipc_read(989):0: Received cib_shm event 2108, 
> size=183, rc=183, text:  cib_callid="268" cib_clientid="60010689-7350-4916-a7bd-bd85ff
> trace   May 20 12:58:31 mainloop_gio_callback(659):0: New message from 
> cib_shm[0x23b7ab0] 

Re: [ClusterLabs] Creating a rule based on whether a quorum exists

2016-06-06 Thread Ken Gaillot
On 05/30/2016 08:13 AM, Les Green wrote:
> Hi All,
> 
> I have a two-node cluster with no-quorum-policy=ignore and an external
> ping responder to try to determine if a node has its network down (it's
> the dead one), or if the other node is really dead..
> 
> The ping helps to determine who the master is.
> 
> I have realised in the situation where the ping responder goes down,
> both stop being the master.
> 
> Code can be seen here: https://github.com/greemo/vagrant-fabric
> 
> I currently have the following rule which prevents a node becoming a
> master unless it can access the ping resource. (I may add more ping
> resources later):
> 
> 
>rsc="g_mysql" with-rsc="ms_drbd_mysql" with-rsc-role="Master"/>
>   
>  id="l_drbd_master_on_ping-rule">
>id="l_drbd_master_on_ping-rule-expression"/>
>type="number" id="l_drbd_master_on_ping-rule-expression-0"/>
> 
>   
>first="ms_drbd_mysql" first-action="promote" then="g_mysql"
> then-action="start"/>
> 
> 
> 
> I want to create a rule that says "if I am not in a quorum AND I cannot
> access all the ping resources, do not become the master". I can sort out
> the ping part, but how can I determine within a Pacemaker rule if I am
> part of a quorum?
> 
> I have thought to set up a cron job using shell tools to query the CIB
> and populate an attribute, but surely there has to be an easier way...
> 
> Hopefully, Les

Not that I'm aware of. Some alternatives: set up the ping responder as a
quorum-only node instead; configure fencing and get rid of the ping
resource; list the cluster nodes in the ping resource's host_list and
change the rule to lte 1.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Different pacemaker versions split cluster

2016-06-06 Thread Ken Gaillot
On 05/30/2016 01:14 PM, DacioMF wrote:
> Hi,
> 
> I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked well. 
> I need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my 
> resources. Two nodes have been updated to 16.04 and the two others remains 
> with 14.04. The problem is that my cluster was splited and the nodes with 
> Ubuntu 14.04 only work with the other in the same version. The same is true 
> for the nodes with Ubuntu 16.04. The feature set of pacemaker in Ubuntu 14.04 
> is v3.0.7 and in 16.04 is v3.0.10.
> 
> The following commands shows what's happening:
> 
> root@xenserver50:/var/log/corosync# crm status
> Last updated: Thu May 19 17:19:06 2016
> Last change: Thu May 19 09:00:48 2016 via cibadmin on xenserver50
> Stack: corosync
> Current DC: xenserver51 (51) - partition with quorum
> Version: 1.1.10-42f2063
> 4 Nodes configured
> 4 Resources configured
> 
> Online: [ xenserver50 xenserver51 ]
> OFFLINE: [ xenserver52 xenserver54 ]
> 
> -
> 
> root@xenserver52:/var/log/corosync# crm status
> Last updated: Thu May 19 17:20:04 2016Last change: Thu May 19 
> 08:54:57 2016 by hacluster via crmd on xenserver54
> Stack: corosync
> Current DC: xenserver52 (version 1.1.14-70404b0) - partition with quorum
> 4 nodes and 4 resources configured
> 
> Online: [ xenserver52 xenserver54 ]
> OFFLINE: [ xenserver50 xenserver51 ]
> 
> xenserver52 and xenserver54 are Ubuntu 16.04 the others are Ubuntu 14.04.
> 
> Someone knows what's the problem?
> 
> Sorry by my poor english.
> 
> Best regards,
>  DacioMF Analista de Redes e Infraestrutura

Hi,

We aim for backward compatibility, so this likely is a bug. Can you
attach the output of crm_report from around this time?

  crm_report --from "-M-D H:M:S" --to "-M-D H:M:S"

FYI, you cannot do a rolling upgrade from corosync 1 to corosync 2, but
I believe both 14.04 and 16.04 use corosync 2.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.15 - Release Candidate 4

2016-06-12 Thread Ken Gaillot
On 06/12/2016 07:28 AM, Ferenc Wágner wrote:
> Ken Gaillot <kgail...@redhat.com> writes:
> 
>> With this release candidate, we now provide three sample alert scripts
>> to use with the new alerts feature, installed in the
>> /usr/share/pacemaker/alerts directory.
> 
> Hi,
> 
> Is there a real reason to name these scripts *.sample?  Sure, they are
> samples, but they are also usable as-is, aren't they?

Almost as-is -- copy them somewhere, rename them without ".sample", and
mark them executable.

After some discussion, we decided that this feature is not mature enough
yet to provide the scripts for direct use. After we get some experience
with how users actually use the feature and the sample scripts, we can
gain more confidence in recommending them generally. Until then, we
recommend that people examine the script source and edit it to suit
their needs before using it.

That said, I think the SNMP script in particular is quite useful.

The log-to-file script is more a proof-of-concept that people can use as
a template. The SMTP script may be useful, but probably paired with some
custom software handling the recipient address, to avoid flooding a real
person's mailbox when a cluster is active.

>> The ./configure script has a new "--with-configdir" option.
> 
> This greatly simplifies packaging, thanks much!
> 
> Speaking about packaging: are the alert scripts run by remote Pacemaker
> nodes?  I couldn't find described which nodes run the alert scripts.
> From the mailing list discussions I recall they are run by each node,
> but this would be useful to spell out in the documentation, I think.

Good point. Alert scripts are run only on cluster nodes, but they
include remote node events. I'll make sure the documentation mentions that.

> Similarly for the alert guarrantees: I recall there's no such thing, but
> one could also think they are parts of transactions, thus having recovery
> behavior similar to the resource operations.  Hmm... wouldn't such
> design actually make sense?

We didn't want to make any cluster operation depend on alert script
success. The only thing we can guarantee is that the cluster will try to
call the alert script for each event. But if the system is going
haywire, for example, we may be unable to spawn a new process due to
some resource exhaustion, and of course the script itself may have problems.

Also, we wanted to minimize the script interface, and keep it
backward-compatible with crm_mon external scripts. We didn't want to add
an OCF-style layer of meta-data, actions and return codes, instead
keeping it as simple as possible for anyone writing one.

Since it's a brand new feature, we definitely want feedback on all
aspects once it's in actual use. If alert script failures turns out to
be a big issue, I could see maybe reporting them in cluster status (and
allowing that to be cleaned up).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 4

2016-06-10 Thread Ken Gaillot
The latest release candidate for Pacemaker version 1.1.15 is now
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc4

With this release candidate, we now provide three sample alert scripts
to use with the new alerts feature, installed in the
/usr/share/pacemaker/alerts directory.

The ./configure script has a new "--with-configdir" option. Different
systems put start-up environment variables in various locations --
/etc/sysconfig, /etc/default, /etc/conf.d, and so on. We looked into
auto-detecting this, but it became clear that the user (or packager) is
in the best position to configure it. The default is /etc/sysconfig.

Bugfixes since 1.1.15-rc3 include multiple memory issues, important
fixes for ocf:pacemaker:controld, and improved compatibility with nodes
running version 1.1.11 or earlier during rolling upgrades.

Everyone is encouraged to download, compile and test the new release.
Your feedback is important and appreciated.

This is most likely very close to the final 1.1.15 release.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-03 Thread Ken Gaillot
On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>> A recent thread discussed a proposed new feature, a new environment
>> variable that would be passed to resource agents, indicating whether a
>> stop action was part of a recovery.
>>
>> Since that thread was long and covered a lot of topics, I'm starting a
>> new one to focus on the core issue remaining:
>>
>> The original idea was to pass the number of restarts remaining before
>> the resource will no longer tried to be started on the same node. This
>> involves calculating (fail-count - migration-threshold), and that
>> implies certain limitations: (1) it will only be set when the cluster
>> checks migration-threshold; (2) it will only be set for the failed
>> resource itself, not for other resources that may be recovered due to
>> dependencies on it.
>>
>> Ulrich Windl proposed an alternative: setting a boolean value instead. I
>> forgot to cc the list on my reply, so I'll summarize now: We would set a
>> new variable like OCF_RESKEY_CRM_recovery=true
> 
> This concept worries me, especially when what we've implemented is
> called OCF_RESKEY_CRM_restarting.

Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.

> The name alone encourages people to "optimise" the agent to not
> actually stop the service "because its just going to start again
> shortly".  I know thats not what Adam would do, but not everyone
> understands how clusters work.
> 
> There are any number of reasons why a cluster that intends to restart
> a service may not do so.  In such a scenario, a badly written agent
> would cause the cluster to mistakenly believe that the service is
> stopped - allowing it to start elsewhere.
> 
> Its true there are any number of ways to write bad agents, but I would
> argue that we shouldn't be nudging people in that direction :)

I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true? If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.

>> whenever a start is
>> scheduled after a stop on the same node in the same transition. This
>> would avoid the corner cases of the previous approach; instead of being
>> tied to migration-threshold, it would be set whenever a recovery was
>> being attempted, for any reason. And with this approach, it should be
>> easier to set the variable for all actions on the resource
>> (demote/stop/start/promote), rather than just the stop.
>>
>> I think the boolean approach fits all the envisioned use cases that have
>> been discussed. Any objections to going that route instead of the count?
>> --
>> Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Processing failed op monitor for WebSite on node1: not running (7)

2016-06-14 Thread Ken Gaillot
On 06/14/2016 03:10 AM, Jeremy Voisin wrote:
> Hi all,
> 
>  
> 
> We actually have a 2 nodes cluster with corosync and pacemaker for
> httpd. We have 2 VIP configured.
> 
>  
> 
> Since we’ve added ModSecurity 2.9, httpd restart is very slow. So I
> increased the start / stop timeout. But sometimes, after logrotate the
> following error occurs :
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> Here is the full output of crm_mon :
> 
> Last updated: Tue Jun 14 07:22:28 2016  Last change: Fri Jun 10
> 09:28:03 2016 by root via cibadmin on node1
> 
> Stack: corosync
> 
> Current DC: node1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with
> quorum
> 
> 2 nodes and 4 resources configured
> 
>  
> 
> Online: [ node1 node2 ]
> 
>  
> 
> WebSite (systemd:httpd):Started node1
> 
> Resource Group: WAFCluster
> 
>  VirtualIP  (ocf::heartbeat:IPaddr2):   Started node1
> 
>  MailMon(ocf::heartbeat:MailTo):Started node1
> 
>  VirtualIP2 (ocf::heartbeat:IPaddr2):   Started node1
> 
>  
> 
> Failed Actions:
> 
> * WebSite_monitor_30 on node1 'not running' (7): call=26,
> status=complete, exitreason='none',
> 
> last-rc-change='Tue Jun 14 03:43:05 2016', queued=0ms, exec=0ms
> 
>  
> 
> # pcs resource --full
> 
> Resource: WebSite (class=systemd type=httpd)
> 
>   Attributes: configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://127.0.0.1/server-status monitor=1min
> 
>   Operations: monitor interval=300s (WebSite-monitor-interval-300s)
> 
>   start interval=0s timeout=300s (WebSite-start-interval-0s)
> 
>   stop interval=0s timeout=300s (WebSite-stop-interval-0s)
> 
> Group: WAFCluster
> 
>   Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.74 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP-monitor-interval-30s)
> 
>   Resource: MailMon (class=ocf provider=heartbeat type=MailTo)
> 
>Attributes: email=sys...@dfi.ch
> 
>Operations: start interval=0s timeout=10 (MailMon-start-interval-0s)
> 
>stop interval=0s timeout=10 (MailMon-stop-interval-0s)
> 
>monitor interval=10 timeout=10 (MailMon-monitor-interval-10)
> 
>   Resource: VirtualIP2 (class=ocf provider=heartbeat type=IPaddr2)
> 
>Attributes: ip=195.70.7.75 cidr_netmask=27
> 
>Operations: start interval=0s timeout=20s (VirtualIP2-start-interval-0s)
> 
>stop interval=0s timeout=20s (VirtualIP2-stop-interval-0s)
> 
>monitor interval=30s (VirtualIP2-monitor-interval-30s)
> 
>  
> 
>  
> 
> If I run /crm_resource –P/ the Failed Actions disappear.
> 
>  
> 
> How can I fix the monitor “not running” error ?
> 
>  
> 
> Thanks,
> 
> Jérémy

Why does logrotate cause the site to stop responding? Normally it's a
graceful restart, which shouldn't cause any interruptions.

Any solution will have to be in logrotate, to keep it from interrupting
service.

Personally, my preferred configuration is to make apache log to syslog
instead of its usual log file. You can even configure syslog to log it
to the usual file, so there's no major difference. Then, you don't need
a separate logrotate script for apache, it gets rotated with the system
log. That avoids having to restart apache, which for a busy site can be
a big deal. It also gives you the option of tying into syslog tools such
as remote logging.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Apache Active Active Balancer without FileSystem Cluster

2016-06-13 Thread Ken Gaillot
On 06/13/2016 08:06 AM, Klaus Wenninger wrote:
> On 06/13/2016 02:33 PM, alan john wrote:
>> Dear All,
>>
>> I am trying to setup an  Apache active-active cluster. I do not wish
>> to have common file system for both nodes. it i  However I do not like
>> to have  pcs/corosync to start or stop apache, but monitor it and move
>> only VIP  to secondary node and on recovery pull it back. Would this
>> be practically possible or do you think it is not-achievable.
>>
>>
>> I have following constraints.
>>
>> 1. Virtual IP is not where apache is not running. --- Could not achieve.
>> 2. Node 1 is priority - Works fine
>> 3. pcs should not start/ stop apache  -- Works fine using un-managed

Unmanaged won't let you achieve #1.

It's a lot easier to let the cluster manage apache, but if you really
want to go the other way, you'll need to write a custom OCF agent for
apache.

Start/stop/monitor should use the ha_pseudo_resource function in
ocf-shellfuncs so the agent can distinguish "running" from "not running"
(itself, not apache).

The monitor command should additionally check apache (you can copy the
code from the standard agent), and set a node attribute with apache's
status.

Clone the agent so it's always running everywhere apache might run.

Finally, set a location constraint for your VIP using a rule matching
that node attribute.

So, if apache fails, the new agent detects that and updates the node
attribute, and pacemaker moves the VIP away from that node.

>> 4. send mail when vip gets switched or cluster status changes -- guess
>> achievable.
> Check out the new alerts-feature in pacemaker 1.1.15 for that.
>> 5.  pcs monitor apache process id  and responses to satisfy Pt1.
>>
>> Regards,
>> Alan

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 05:45 PM, Adam Spiers wrote:
> Adam Spiers <aspi...@suse.com> wrote:
>> Andrew Beekhof <abeek...@redhat.com> wrote:
>>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote:
>>>> Ken Gaillot <kgail...@redhat.com> wrote:
>>>>> My main question is how useful would it actually be in the proposed use
>>>>> cases. Considering the possibility that the expected start might never
>>>>> happen (or fail), can an RA really do anything different if
>>>>> start_expected=true?
>>>>
>>>> That's the wrong question :-)
>>>>
>>>>> If the use case is there, I have no problem with
>>>>> adding it, but I want to make sure it's worthwhile.
>>>>
>>>> The use case which started this whole thread is for
>>>> start_expected=false, not start_expected=true.
>>>
>>> Isn't this just two sides of the same coin?
>>> If you're not doing the same thing for both cases, then you're just
>>> reversing the order of the clauses.
>>
>> No, because the stated concern about unreliable expectations
>> ("Considering the possibility that the expected start might never
>> happen (or fail)") was regarding start_expected=true, and that's the
>> side of the coin we don't care about, so it doesn't matter if it's
>> unreliable.
> 
> BTW, if the expected start happens but fails, then Pacemaker will just
> keep repeating until migration-threshold is hit, at which point it
> will call the RA 'stop' action finally with start_expected=false.
> So that's of no concern.

To clarify, that's configurable, via start-failure-is-fatal and on-fail

> Maybe your point was that if the expected start never happens (so
> never even gets a chance to fail), we still want to do a nova
> service-disable?

That is a good question, which might mean it should be done on every
stop -- or could that cause problems (besides delays)?

Another aspect of this is that the proposed feature could only look at a
single transition. What if stop is called with start_expected=false, but
then Pacemaker is able to start the service on the same node in the next
transition immediately afterward? Would having called service-disable
cause problems for that start?

> Yes that would be nice, but this proposal was never intended to
> address that.  I guess we'd need an entirely different mechanism in
> Pacemaker for that.  But let's not allow perfection to become the
> enemy of the good ;-)

The ultimate concern is that this will encourage people to write RAs
that leave services in a dangerous state after stop is called.

I think with naming and documenting it properly, I'm fine to provide the
option, but I'm on the fence. Beekhof needs a little more convincing :-)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Ken Gaillot
On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote:
> 06.06.2016 22:43, Ken Gaillot wrote:
>> On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:
>>> 06.06.2016 19:39, Ken Gaillot wrote:
>>>> On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
>>>>> On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com>
>>>>> wrote:
>>>>>> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
>>>>>>> On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com>
>>>>>>> wrote:
>>>>>>>> A recent thread discussed a proposed new feature, a new environment
>>>>>>>> variable that would be passed to resource agents, indicating
>>>>>>>> whether a
>>>>>>>> stop action was part of a recovery.
>>>>>>>>
>>>>>>>> Since that thread was long and covered a lot of topics, I'm
>>>>>>>> starting a
>>>>>>>> new one to focus on the core issue remaining:
>>>>>>>>
>>>>>>>> The original idea was to pass the number of restarts remaining
>>>>>>>> before
>>>>>>>> the resource will no longer tried to be started on the same node.
>>>>>>>> This
>>>>>>>> involves calculating (fail-count - migration-threshold), and that
>>>>>>>> implies certain limitations: (1) it will only be set when the
>>>>>>>> cluster
>>>>>>>> checks migration-threshold; (2) it will only be set for the failed
>>>>>>>> resource itself, not for other resources that may be recovered
>>>>>>>> due to
>>>>>>>> dependencies on it.
>>>>>>>>
>>>>>>>> Ulrich Windl proposed an alternative: setting a boolean value
>>>>>>>> instead. I
>>>>>>>> forgot to cc the list on my reply, so I'll summarize now: We would
>>>>>>>> set a
>>>>>>>> new variable like OCF_RESKEY_CRM_recovery=true
>>>>>>>
>>>>>>> This concept worries me, especially when what we've implemented is
>>>>>>> called OCF_RESKEY_CRM_restarting.
>>>>>>
>>>>>> Agreed; I plan to rename it yet again, to
>>>>>> OCF_RESKEY_CRM_start_expected.
>>>>>>
>>>>>>> The name alone encourages people to "optimise" the agent to not
>>>>>>> actually stop the service "because its just going to start again
>>>>>>> shortly".  I know thats not what Adam would do, but not everyone
>>>>>>> understands how clusters work.
>>>>>>>
>>>>>>> There are any number of reasons why a cluster that intends to
>>>>>>> restart
>>>>>>> a service may not do so.  In such a scenario, a badly written agent
>>>>>>> would cause the cluster to mistakenly believe that the service is
>>>>>>> stopped - allowing it to start elsewhere.
>>>>>>>
>>>>>>> Its true there are any number of ways to write bad agents, but I
>>>>>>> would
>>>>>>> argue that we shouldn't be nudging people in that direction :)
>>>>>>
>>>>>> I do have mixed feelings about that. I think if we name it
>>>>>> start_expected, and document it carefully, we can avoid any casual
>>>>>> mistakes.
>>>>>>
>>>>>> My main question is how useful would it actually be in the
>>>>>> proposed use
>>>>>> cases. Considering the possibility that the expected start might
>>>>>> never
>>>>>> happen (or fail), can an RA really do anything different if
>>>>>> start_expected=true?
>>>>>
>>>>> I would have thought not.  Correctness should trump optimal.
>>>>> But I'm prepared to be mistaken.
>>>>>
>>>>>> If the use case is there, I have no problem with
>>>>>> adding it, but I want to make sure it's worthwhile.
>>>>
>>>> Anyone have comments on this?
>>>>
>>>> A simple example: pacemaker calls an RA stop with start_expected=true,
>>>> then before the start happens, someone disables the resource, so the
>>>

Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ken Gaillot
On 06/08/2016 06:54 AM, Jehan-Guillaume de Rorthais wrote:
> 
> 
> Le 8 juin 2016 13:36:03 GMT+02:00, Nikhil Utane  
> a écrit :
>> Hi,
>>
>> Would like to know the best and easiest way to add a new node to an
>> already
>> running cluster.
>>
>> Our limitation:
>> 1) pcsd cannot be used since (as per my understanding) it communicates
>> over
>> ssh which is prevented.
> 
> As far as i remember,  pcsd deamons use their own tcp port (not the ssh one) 
> and communicate with each others using http queries (over ssl i suppose).

Correct, pcsd uses port 2224. It encrypts all traffic. If you can get
that allowed through your firewall between cluster nodes, that will be
the easiest way.

corosync.conf does need to be kept the same on all nodes, and corosync
needs to be reloaded after any changes. pcs will handle this
automatically when adding/removing nodes. Alternatively, it is possible
to use corosync.conf with multicast, without explicitly listing
individual nodes.

> As far as i understand, crmsh uses ssh, not pcsd.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 03:26 AM, Jan Pokorný wrote:
> On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
>> next question: I'm on centos 7 and there's no more /etc/init.d/> anything>. With lennartware spreading, is there a coherent plan to deal
>> with former LSB agents?
> 
> Pacemaker can drive systemd-managed services for quite some time.

This is as easy as changing lsb:dovecot to systemd:dovecot.

Or, if you specify it as service:dovecot, Pacemaker will check whether
LSB, systemd or upstart is used on the local system, and call the
appropriate one.

As with LSB, don't enable systemd-managed services to start at boot, if
you want the cluster to manage them.

One issue that sometimes comes up: some scripts (some logrotate conf
files or cron jobs, for example) will call "systemctl reload
". If the service is managed by the cluster, systemd
doesn't think it's running, so the reload will fail. You have to replace
such lines with a native reload mechanism for the service.

> Provided that the project/daemon you care about carries the unit
> file, you can use that unless there are distinguished roles for the
> provided service within the cluster (like primary+replicas), there's
> a need to run multiple varying instances of the same service,
> or other cluster-specific features are desired.
> 
> For dovecot, I can see:
> # rpm -ql dovecot | grep \.service
> /usr/lib/systemd/system/dovecot.service 
> 
>> Specifically, should I roll my own RA for dovecot or is there one in the
>> works somewhere?
> 
> If you miss something with the generic approach per above, and there's
> no fitting open-sourced RA around then it's probably your last resort.
> 
> For instance, there was once an agent written in C (highly unusual),
> but seems abandoned a long time ago:
> https://github.com/perrit/dovecot-ocf-resource-agent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Ken Gaillot
On 06/08/2016 09:11 AM, Ken Gaillot wrote:
> On 06/08/2016 03:26 AM, Jan Pokorný wrote:
>> On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
>>> next question: I'm on centos 7 and there's no more /etc/init.d/>> anything>. With lennartware spreading, is there a coherent plan to deal
>>> with former LSB agents?
>>
>> Pacemaker can drive systemd-managed services for quite some time.
> 
> This is as easy as changing lsb:dovecot to systemd:dovecot.
> 
> Or, if you specify it as service:dovecot, Pacemaker will check whether
> LSB, systemd or upstart is used on the local system, and call the
> appropriate one.
> 
> As with LSB, don't enable systemd-managed services to start at boot, if
> you want the cluster to manage them.
> 
> One issue that sometimes comes up: some scripts (some logrotate conf
> files or cron jobs, for example) will call "systemctl reload
> ". If the service is managed by the cluster, systemd
> doesn't think it's running, so the reload will fail. You have to replace
> such lines with a native reload mechanism for the service.

Whoops -- I was thinking of when an OCF agent is used. If you use
systemd: or service:, systemd does know the service is running, so
systemctl reload/status will work just fine.

>> Provided that the project/daemon you care about carries the unit
>> file, you can use that unless there are distinguished roles for the
>> provided service within the cluster (like primary+replicas), there's
>> a need to run multiple varying instances of the same service,
>> or other cluster-specific features are desired.
>>
>> For dovecot, I can see:
>> # rpm -ql dovecot | grep \.service
>> /usr/lib/systemd/system/dovecot.service 
>>
>>> Specifically, should I roll my own RA for dovecot or is there one in the
>>> works somewhere?
>>
>> If you miss something with the generic approach per above, and there's
>> no fitting open-sourced RA around then it's probably your last resort.
>>
>> For instance, there was once an agent written in C (highly unusual),
>> but seems abandoned a long time ago:
>> https://github.com/perrit/dovecot-ocf-resource-agent

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker_remoted XML parse error

2016-06-08 Thread Ken Gaillot
On 06/08/2016 06:14 AM, Narayanamoorthy Srinivasan wrote:
> I have a pacemaker cluster with two pacemaker remote nodes. Recently the
> remote nodes started throwing below errors and SDB started self-fencing.
> Appreciate if someone throws light on what could be the issue and the fix.
> 
> OS - SLES 12 SP1
> Pacemaker Remote version - pacemaker-remote-1.1.13-14.7.x86_64
> 
> 2016-06-08T14:11:46.009073+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : AttValue: ' expected
> 2016-06-08T14:11:46.009314+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.009443+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.009567+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : attributes construct error
> 2016-06-08T14:11:46.009697+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.009824+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.009948+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Couldn't find end of Start Tag lrm_rsc_op line 1
> 2016-06-08T14:11:46.010070+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.010191+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.010460+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Premature end of data in tag lrm_resource line 1
> 2016-06-08T14:11:46.010718+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:
> key="neutron-ha-tool_monitor_0" operation="monitor"
> crm-debug-origin="do_update_
> 2016-06-08T14:11:46.010977+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error:  
>  ^
> 2016-06-08T14:11:46.011234+05:30 d18-fb-7b-18-f1-8e
> pacemaker_remoted[6190]:error: XML Error: Entity: line 1: parser
> error : Premature end of data in tag lrm_resources line 1
> 
> 
> -- 
> Thanks & Regards
> Moorthy

This sounds like the network traffic between the cluster nodes and the
remote nodes is being corrupted. Have there been any network changes
lately? Switch/firewall/etc. equipment/settings? MTU?

You could try using a packet sniffer such as wireshark to see if the
traffic looks abnormal in some way. The payload is XML so it should be
more or less readable.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] newbie questions

2016-05-31 Thread Ken Gaillot
On 05/31/2016 03:59 PM, Jay Scott wrote:
> Greetings,
> 
> Cluster newbie
> Centos 7
> trying to follow the "Clusters from Scratch" intro.
> 2 nodes (yeah, I know, but I'm just learning)
> 
> [root@smoking ~]# pcs status
> Cluster name:
> Last updated: Tue May 31 15:32:18 2016Last change: Tue May 31
> 15:02:21
>  2016 by root via cibadmin on smoking
> Stack: unknown

"Stack: unknown" is a big problem. The cluster isn't aware of the
corosync configuration. Did you do the "pcs cluster setup" step?

> Current DC: NONE
> 2 nodes and 1 resource configured
> 
> OFFLINE: [ mars smoking ]
> 
> Full list of resources:
> 
>  ClusterIP(ocf::heartbeat:IPaddr2):Stopped
> 
> PCSD Status:
>   smoking: Online
>   mars: Online
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> What concerns me at the moment:
> I did
> pcs resource enable ClusterIP
> while simultaneously doing
> tail -f /var/log/cluster/corosync.log
> (the only log in there)

The system log (/var/log/messages or whatever your system has
configured) is usually the best place to start. The cluster software
sends messages of interest to end users there, and it includes messages
from all components (corosync, pacemaker, resource agents, etc.).

/var/log/cluster/corosync.log (and in some configurations,
/var/log/pacemaker.log) have more detailed log information for debugging.

> and nothing happens in the log, but the ClusterIP
> stays "Stopped".  Should I be able to ping that addr?
> I can't.
> It also says OFFLINE:  and both of my machines are offline,
> though the PCSD says they're online.  Which do I trust?

The first online/offline output is most important, and refers to the
node's status in the actual cluster; the "PSCD" online/offline output
simply tells whether the pcs daemon is running. Typically, the pcs
daemon is enabled at boot and is always running. The pcs daemon is not
part of the clustering itself; it's a front end to configuring and
administering the cluster.

> [root@smoking ~]# pcs property show stonith-enabled
> Cluster Properties:
>  stonith-enabled: false
> 
> yet I see entries in the corosync.log referring to stonith.
> I'm guessing that's normal.

Yes, you can enable stonith at any time, so the stonith daemon will
still run, to stay aware of the cluster status.

> My corosync.conf file says the quorum is off.
> 
> I also don't know what to include in this for any of you to
> help me debug.
> 
> Ahh, also, is this considered "long", and if so, where would I post
> to the web?
> 
> thx.
> 
> j.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: RES: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-05-27 Thread Ken Gaillot
On 05/27/2016 12:58 AM, Ulrich Windl wrote:
> Hi!
> 
> Thanks for this info. We actually run the "noop" scheduler for  the SAN
> storage (as per menufacturer's recommendation), because on "disk" is actually
> spread over up to 40 disks.
> Other settings we changes was:
> queue/rotational:0
> queue/add_random:0
> queue/max_sectors_kb:128 (manufacturer's recommendation, before up to 1MB
> transfers were seen)
> queue/read_ahead_kb:0
> 
> And we apply those setting (where available) the the whole stack (disk
> devices, multipath device, LV).
> 
> Regards,
> Ulrich

I don't have anything to add about clvm specifically, but some general
RAID tips that are often overlooked:

If you're using striped RAID (i.e. >1), it's important to choose a
stripe size wisely and make sure everything is aligned with it. Somewhat
counterintuitively, smaller stripe sizes are better for large reads and
writes, while larger stripe sizes are better for small reads and writes.
There's a big performance penalty by setting a stripe size too small,
but not much penalty from setting it too large.

Things that should be aligned:

* Partition sizes. A disk's first usable partition will generally start
at (your stripe size in kilobytes * 2) sectors.

* LVM physical volume metadata (via the --metadatasize option to
pvcreate). It will set the metadata size to the next 64K boundary above
the value, so set it to be just under the size you want, ex.
--metadatasize 1.99M will get a metadata size of 2MB.

* The filesystem creation options (varies by fs type). For example, with
ext3/ext4, where N1 is stripe size in kilobytes / 4, and N2 is $N1 times
the number of nonparity disks in the array, use -E
stride=$N1,stripe-width=$N2. For xfs, where STRIPE is the stripe size in
kilobytes and NONPARITY is the number of nonparity disks in the array,
use -d su=${STRIPE}k,sw=${NONPARITY} -l su=${STRIPE}k.

If your RAID controller has power backup (BBU or supercapacitor), mount
filesystems with the nobarrier option.

 "Carlos Xavier"  schrieb am 25.05.2016 um 22:25
> in
> Nachricht <01da01d1b6c3$8f5c3dc0$ae14b940$@com.br>:
>> Hi.
>>
>> I have been running OCFS2 on clusters for quite long time.
>> We started running it over DRBD and now we have it running on a Dell 
>> storage.
>> Over DRBD it showed a very poor performance, most because the way DRBD 
>> works.
>> To improve the performance we had to change the I/O Scheduler of the disk to
> 
>> "Deadline"
>>
>> When we migrate the system to the storage, the issue show up again. 
>> Sometimes the system was hanging due to disk access, to solve the issue I 
>> changed the I/O Schedule To Deadline and the trouble vanished.
>>
>> Regards,
>> Carlos.
>>
>>
>>> -Mensagem original-
>>> De: Kristoffer Grönlund [mailto:kgronl...@suse.com]
>>> Enviada em: quarta-feira, 25 de maio de 2016 06:55
>>> Para: Ulrich Windl; users@clusterlabs.org 
>>> Assunto: Re: [ClusterLabs] Performance of a mirrored LV (cLVM) with OCFS: 
>> Attempt to monitor it
>>>
>>> Ulrich Windl  writes:
>>>
 cLVM has never made a good impression regarding performance, so I wonder
> if 
>> there's anything we
>>> could do to improve the4 performance. I suspect that one VM paging heavily
> 
>> on OCFS2 kills the
>>> performance of the whole cluster (that hosts Xen PV guests only). Anyone 
>> with deeper insights?
>>>
>>> My understanding is that this is a problem inherent in the design of CLVM 
>> and there is work ongoing to
>>> mitigate this by handling clustering in md instead. See this LWN article
> for 
>> more details:
>>>
>>> http://lwn.net/Articles/674085/ 
>>>
>>> Cheers,
>>> Kristoffer
>>>
>>> --
>>> // Kristoffer Grönlund
>>> // kgronl...@suse.com 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.15 - Release Candidate 3

2016-05-27 Thread Ken Gaillot
The third release candidate for Pacemaker version 1.1.15 is now
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc3

Perhaps the most visible change since 1.1.15-rc2 is that many log
messages have been made more user-friendly. Partly this is due to taking
advantage of the "extended information logging" feature of libqb 0.17.2
and greater (if installed on the build machine); the same message may be
logged to the system log without developer-oriented details, and to the
pacemaker detail log with the extra detail.

This release candidate includes multiple bugfixes since 1.1.15-rc2, most
importantly:

* In 1.1.14, the controld resource agent was modified to return a
monitor error when DLM is in the "wait fencing" state. This turned out
to be too aggressive, resulting in fencing the monitored node
unnecessarily if a slow fencing operation against another node was in
progress. The agent now does additional checking to determine whether to
return an error or not.

* A bug introduced in 1.1.14, resulting in the have-watchdog property
always being set to true, has been fixed. The cluster now properly
checks for a running sbd process.

* A regression introduced in 1.1.15-rc1 has been fixed. When a node ID
is reused, attrd would have problems setting attributes for the new node.

Everyone is encouraged to download, compile and test the new release.
Your feedback is important and appreciated. I am aiming for one more
release candidate, with the final release in mid- to late June.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Q: status section of CIB: "last_0" IDs and "queue-time"

2016-06-02 Thread Ken Gaillot
On 06/02/2016 01:07 AM, Ulrich Windl wrote:
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 01.06.2016 um 16:14 in 
>>>> Nachricht
> <574eede2.1090...@redhat.com>:
>> On 06/01/2016 06:14 AM, Ulrich Windl wrote:
>>> Hello!
>>>
>>> I have a question:
>>> Inspecting the XML of our cluster, I noticed that there are several IDs 
>> ending with "last_0". So I wondered:
>>> It seems those IDs are generated for start and stop operations, and I 
>> discovered one case where an ID is duplicate (the status is for different 
>> nodes, and one is a start operation, while the other is a stop 
>> operationhowever).
>>
>> The "*_last_*" IDs simply refer to the last (= most recently executed)
>> operation :)
>>
>> Those IDs are not directly used by the cluster; they're just used to
>> store the most recent operation in the CIB.
>>
>>> Background: I wrote some program that extarcts the runtimes of operations 
>> from the CIB, like this:
>>> prm_r00_fs_last_0 13464 stop
>>> prm_r00_fs_last_0 61 start
>>> prm_r00_fs_monitor_30 34 monitor
>>> prm_r00_fs_monitor_30 43 monitor
>>>
>>> The first word is the "id" attribute, the second is the "exec-time" 
>> attribute, and the last one (added to help myself out of confusion) is the 
>> "operation" attribute. Values are converted to milliseconds.
>>>
>>> Is the name of the id intentional, or is it some mistake?
>>>
>>> And another question: For an operation with "start-delay" it seems the 
>>> start 
>> delay is simple added to the queue time (as if the operation was waiting 
>> that 
>> long). Is that intentional?
>>
>> Yes. The operation is queued when it is received, and if it has a start
>> delay, a timer is set to execute it at a later time. So the delay
>> happens while the operation is queued.
> 
> Ken,
> 
> thanks for the answers. Is there a way to distinguish "intentional" from "non 
> intentional" queueing? One would look deeper into non-intentional queueing.

No, from the cluster's point of view, it's always intentional, just
different lengths of time. You'd just have to subtract any start delay
if you're not interested in that.

> Regards,
> Ulrich
> 
>>
>>> Another program tried to extract queue and execution times for operations, 
>> and the sorted result looks like this then:
>>>
>>> 1 27 prm_nfs_home_exp_last_0 monitor
>>> 1 39 prm_q10_ip_2_monitor_6 monitor
>>> 1 42 prm_e10_ip_2_monitor_6 monitor
>>> 1 58 prm_s01_ip_last_0 stop
>>> 1 74 prm_nfs_cbw_trans_exp_last_0 start
>>> 30001 1180 prm_stonith_sbd_monitor_18 monitor
>>> 30001 178 prm_c11_ascs_ers_monitor_6 monitor
>>> 30002 165 prm_c11_ascs_ers_monitor_45000 monitor
>>>
>>> Regards,
>>> Ulrich

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] FYI: Alert script permissions

2016-06-01 Thread Ken Gaillot
For anyone playing with the new alerts feature, there is one difference
from the old ClusterMon external scripts to be aware of.

Resource agents such as ClusterMon run as root, so ClusterMon's external
scripts also run as root.

The new alert scripts are run as the hacluster user. So if you are using
a ClusterMon script with the new alerts feature, be aware of permissions
issues. If an alert script needs elevated privileges, it is recommended
to use sudo. If you use SELinux, you may need to grant the hacluster
user access to files/devices/whatever needed by your script, as well as
the ability to execute the script itself.

The new approach has obvious security benefits but may be less
convenient in some cases. If there is a need, we may add the ability to
configure an alert script's run-as user in a future version.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] FYI: ocf:pacemaker:controld issue in rc3

2016-06-01 Thread Ken Gaillot
FYI, the ocf:pacemaker:controld (DLM) resource agent released with
Pacemaker 1.1.15-rc3 has an issue. It will work with an upstream patch
applied to DLM, but not with existing DLM versions.

This has been fixed as of commit 2c148ac, which will be in rc4. It
requires a stonith_admin enhancement, so to use it, you must compile the
entire pacemaker package, not just grab the agent.

Anyone not using the controld agent is still encouraged to download and
test rc3, which has many improvements and is fairly close to what the
final release will be.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 failed to start

2016-06-22 Thread Ken Gaillot
On 06/22/2016 06:46 AM, wd wrote:
> if [ X`uname -s` != "XLinux" ]; then
> ocf_log err "IPaddr2 only supported Linux."
> exit $OCF_ERR_INSTALLED
> fi
> 
> Do you run on a linux? what is 'uname -s' command returned?

It could also return "not installed" if
/usr/lib/ocf/resource.d/heartbeat/IPaddr2 does not exist or is not
executable on debian-drbd1, or if IPaddr2 can't find a command to send ARPs.

The ARP command depends on the value of the resource's "arp_sender"
option, which defaults to "send_arp" (which will look for
/usr/libexec/heartbeat/send_arp) but can be set to "ipoibarping" (when
using infiniband).


> On Wed, Jun 22, 2016 at 6:07 PM, Юрченко Станислав
> > wrote:
> 
> Hello!
> I have configured cluster pacemaker + corosync and it's works. Both
> nodes are online. But when I starting to addind failover_ip resource
> I've faced with this error:
> * failover_ip_monitor_0 on debian-drbd1 'not installed' (5): call=5,
> status=Not installed, exitreason='none',
>  last-rc-change='Wed Jun 22 12:26:08 2016', queued=0ms, exec=1ms
>  However,  crm ra list ocf heartbeat command shows all modules, as
> expected.
> It seems, that some monitor module is not intalled, but I don't know
> how to find which are.
> May someone help me with that?
> -- 
> С уважением, *Юрченко Станислав *
> Системный администратор *ООО "ТЭКАР" *
> e-mail: *yurchenk...@etecar.ru *
> тел.: (861)*991-01-01 доб. 1055*

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-06-21 Thread Ken Gaillot
On 06/17/2016 07:05 AM, Vladislav Bogdanov wrote:
> 03.05.2016 01:14, Ken Gaillot wrote:
>> On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote:
>>> Hi,
>>>
>>> Just found an issue with node is silently unfenced.
>>>
>>> That is quite large setup (2 cluster nodes and 8 remote ones) with
>>> a plenty of slowly starting resources (lustre filesystem).
>>>
>>> Fencing was initiated due to resource stop failure.
>>> lustre often starts very slowly due to internal recovery, and some such
>>> resources were starting in that transition where another resource
>>> failed to stop.
>>> And, as transition did not finish in time specified by the
>>> "failure-timeout" (set to 9 min), and was not aborted, that stop
>>> failure was successfully cleaned.
>>> There were transition aborts due to attribute changes, after that
>>> stop failure happened, but fencing
>>> was not initiated for some reason.
>>
>> Unfortunately, that makes sense with the current code. Failure timeout
>> changes the node attribute, which aborts the transition, which causes a
>> recalculation based on the new state, and the fencing is no longer
> 
> Ken, could this one be considered to be fixed before 1.1.15 is released?

I'm planning to release 1.1.15 later today, and this won't make it in.

We do have several important open issues, including this one, but I
don't want them to delay the release of the many fixes that are ready to
go. I would only hold for a significant issue introduced this cycle, and
none of the known issues appear to qualify.

> I was just hit by the same in the completely different setup.
> Two-node cluster, one node fails to stop a resource, and is fenced.
> Right after that second node fails to activate clvm volume (different
> story, need to investigate) and then fails to stop it. Node is scheduled
> to be fenced, but it cannot be because first node didn't come up yet.
> Any cleanup (automatic or manual) of a resource failed to stop clears
> node state, removing "unclean" state from a node. That is probably not
> what I could expect (resource cleanup is a node unfence)...
> Honestly, this potentially leads to a data corruption...
> 
> Also (probably not related) there was one more resource stop failure (in
> that case - timeout) prior to failed stop mentioned above. And that stop
> timeout did not lead to fencing by itself.
> 
> I have logs (but not pe-inputs/traces/blackboxes) from both nodes, so
> any additional information from them can be easily provided.
> 
> Best regards,
> Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Ken Gaillot
On 06/20/2016 11:33 PM, Nikhil Utane wrote:
> Let me give the full picture about our solution. It will then make it
> easy to have the discussion.
> 
> We are looking at providing N + 1 Redundancy to our application servers,
> i.e. 1 standby for upto N active (currently N<=5). Each server will have
> some unique configuration. The standby will store the configuration of
> all the active servers such that whichever server goes down, the standby
> loads that particular configuration and becomes active. The server that
> went down will now become standby. 
> We have bundled all the configuration that every server has into a
> resource such that during failover the resource is moved to the newly
> active server, and that way it takes up the personality of the server
> that went down. To put it differently, every active server has a
> 'unique' resource that is started by Pacemaker whereas standby has none.
> 
> Our servers do not write anything to an external database, all the
> writing is done to the CIB file under the resource that it is currently
> managing. We also have some clients that connect to the active servers
> (1 client can connect to only 1 server, 1 server can have multiple
> clients) and provide service to end-users. Now the reason I say that
> split-brain is not an issue for us, is coz the clients can only connect
> to 1 of the active servers at any given time (we have to handle the case
> that all clients move together and do not get distributed). So even if
> two servers become active with same personality, the clients can only
> connect to 1 of them. (Initial plan was to go configure quorum but later
> I was told that service availability is of utmost importance and since
> impact of split-brain is limited, we are thinking of doing away with it).
> 
> Now the concern I have is, once the split is resolved, I would have 2
> actives, each having its own view of the resource, trying to synchronize
> the CIB. At this point I want the one that has the clients attached to
> it win.
> I am thinking I can implement a monitor function that can bring down the
> resource if it doesn't find any clients attached to it within a given
> period of time. But to understand the Pacemaker behavior, what exactly
> would happen if the same resource is found to be active on two nodes
> after recovery?
> 
> -Thanks
> Nikhil

In general, monitor actions should not change the state of the service
in any way.

Pacemaker's behavior when finding multiple instances of a resource
running when there should be only one is configurable via the
multiple-active property:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_resource_meta_attributes

By default, it stops all the instances, and then starts one instance.
The alternatives are to stop all the instances and leave them stopped,
or to unmanage the resource (i.e. refuse to stop or start it).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 1.1.15 released

2016-06-21 Thread Ken Gaillot
ClusterLabs is proud to announce the latest release of the Pacemaker
cluster resource manager, version 1.1.15. The source code is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15
The most significant enhancements since version 1.1.14 are:

* A new "alerts" section of the CIB allows you to configure scripts that
will be called after significant cluster events. Sample scripts are
installed in /usr/share/pacemaker/alerts.

* A new pcmk_action_limit option for fence devices allows multiple fence
actions to be executed concurrently. It defaults to 1 to preserve
existing behavior (i.e. serial execution of fence actions).

* Pacemaker Remote support has been improved. Most noticeably, if
pacemaker_remote is stopped without disabling the remote resource first,
any resources will be moved off the node (previously, the node would get
fenced). This allows easier software updates on remote nodes, since
updates often involve restarting the daemon.

* You may notice some files have moved from the pacemaker package to
pacemaker-cli, including most ocf:pacemaker resource agents, the
logrotate configuration, the XML schemas and the SNMP MIB. This allows
Pacemaker Remote nodes to work better when the full pacemaker package is
not installed.

* Have you ever wondered why a resource is not starting when you think
it should? crm_mon will now show why a resource is stopped, for example,
because it is unmanaged, or disabled in the configuration.

* In 1.1.14, the controld resource agent was modified to return a
monitor error when DLM is in the "wait fencing" state. This turned out
to be too aggressive, resulting in fencing the monitored node
unnecessarily if a slow fencing operation against another node was in
progress. The agent now does additional checking to determine whether to
return an error or not.

* Four significant regressions have been fixed. Compressed CIBs larger
than 1MB are again supported (a regression since 1.1.14), fenced unseen
nodes properly are not marked as unclean (also since 1.1.14),
have-watchdog is detected properly rather than always true (also since
1.1.14) and failures of multiple-level monitor checks should again cause
the resource to fail (since 1.1.10).

As usual, the release includes many bugfixes and minor enhancements. For
a more detailed list of changes, see the change log:

https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog

Everyone is encouraged to download, compile and test the new release. We
do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Andrew Beekhof, Bin Liu, Christian Schneider, Christoph Berg,
David Shane Holden, Ferenc Wágner, Gao Yan, Hideo Yamauchi, Jan Pokorný,
Ken Gaillot, Klaus Wenninger, Kostiantyn Ponomarenko, Kristoffer
Grönlund, Lars Ellenberg, Michal Koutný, Nakahira Kazutomo, Oyvind
Albrigtsen, Ruben Kerkhof, and Yusuke Iida. Apologies if I have
overlooked anyone.
-- 
Ken Gaillot <kgail...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] restarting pacemakerd

2016-06-20 Thread Ken Gaillot
On 06/18/2016 05:15 AM, Ferenc Wágner wrote:
> Hi,
> 
> Could somebody please elaborate a little why the pacemaker systemd
> service file contains "Restart=on-failure"?  I mean that a failed node
> gets fenced anyway, so most of the time this would be a futile effort.
> On the other hand, one could argue that restarting failed services
> should be the default behavior of systemd (or any init system).  Still,
> it is not.  I'd be grateful for some insight into the matter.

To clarify one point, the configuration mentioned here is systemd
configuration, not part of pacemaker configuration or operation. Systemd
monitors the processes it launches. With "Restart=on-failure", system
will re-launch pacemaker in situations systemd considers "failure"
(exiting nonzero, exiting with core dump, etc.).

Systemd does have various rate-limiting options, which we leave as
default in the pacemaker unit file. Perhaps one day we could try to come
up with ideal values, but it should be a rare situation, and admins can
always tune them as desired for their system using an override file.

The goal of restart is of course to have a slightly better shot at
recovery. You're right, if fencing is configured and quorum is retained,
the node will almost certainly get fenced anyway, but those conditions
aren't always true.

Systemd upstream recommends Restart=on-failure or Restart=on-abnormal
for all long-running services. on-abnormal would probably be better for
pacemaker, but it's not supported in older systemd versions.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


  1   2   3   4   5   6   7   8   9   10   >