On 03/10/2016 09:38 AM, Bernie Jones wrote: > Hi Ken, > Thanks for your response, I've now corrected the constraint order but the > behaviour is still the same, the IP does not fail over (after the first > time) unless I issue a pcs resource cleanup command on dirsrv-daemon. > > Also, I'm not sure why you advise against using is-managed=false in > production. We are trying to use pacemaker purely to fail over on detection > of a failure and not to control starting or stopping of the instances. It is > essential that in normal operation we have both instances up as we are using > MMR. > > Thanks, > Bernie
I think you misunderstand is-managed. It is used to be able to perform maintenance on a service without pacemaker fencing the node when the service is stopped/restarted. Failover won't work with is-managed=false, because failover involves stopping and starting the service. Your goal is already accomplished by using a clone with master-max=2. With the clone, pacemaker will run the service on both nodes, and with master-max=2, it will be master/master. > -----Original Message----- > From: Ken Gaillot [mailto:kgail...@redhat.com] > Sent: 10 March 2016 15:01 > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] FLoating IP failing over but not failing back > with active/active LDAP (dirsrv) > > On 03/10/2016 08:48 AM, Bernie Jones wrote: >> A bit more info.. >> >> >> >> If, after I restart the failed dirsrv instance, I then perform a "pcs >> resource cleanup dirsrv-daemon" to clear the FAIL messages then the > failover >> will work OK. >> >> So it's as if the cleanup is changing the status in some way.. >> >> >> >> From: Bernie Jones [mailto:ber...@securityconsulting.ltd.uk] >> Sent: 10 March 2016 08:47 >> To: 'Cluster Labs - All topics related to open-source clustering welcomed' >> Subject: [ClusterLabs] FLoating IP failing over but not failing back with >> active/active LDAP (dirsrv) >> >> >> >> Hi all, could you advise please? >> >> >> >> I'm trying to configure a floating IP with an active/active deployment of >> 389 directory server. I don't want pacemaker to manage LDAP but just to >> monitor and switch the IP as required to provide resilience. I've seen > some >> other similar threads and based my solution on those. >> >> >> >> I've amended the ocf for slapd to work with 389 DS and this tests out OK >> (dirsrv). >> >> >> >> I've then created my resources as below: >> >> >> >> pcs resource create dirsrv-ip ocf:heartbeat:IPaddr2 ip="192.168.26.100" >> cidr_netmask="32" op monitor timeout="20s" interval="5s" op start >> interval="0" timeout="20" op stop interval="0" timeout="20" >> >> pcs resource create dirsrv-daemon ocf:heartbeat:dirsrv op monitor >> interval="10" timeout="5" op start interval="0" timeout="5" op stop >> interval="0" timeout="5" meta "is-managed=false" > > is-managed=false means the cluster will not try to start or stop the > service. It should never be used in regular production, only when doing > maintenance on the service. > >> pcs resource clone dirsrv-daemon meta globally-unique="false" >> interleave="true" target-role="Started" "master-max=2" >> >> pcs constraint colocation add dirsrv-daemon-clone with dirsrv-ip >> score=INFINITY > > This constraint means that dirsrv is only allowed to run where dirsrv-ip > is. I suspect you want the reverse, dirsrv-ip with dirsrv-daemon-clone, > which means keep the IP with a working dirsrv instance. > >> pcs property set no-quorum-policy=ignore > > If you're using corosync 2, you generally don't need or want this. > Instead, ensure corosync.conf has two_node: 1 (which will be done > automatically if you used pcs cluster setup). > >> pcs resource defaults migration-threshold=1 >> >> pcs property set stonith-enabled=false >> >> >> >> On startup all looks well: >> >> > ____________________________________________________________________________ >> ____________ >> >> >> >> Last updated: Thu Mar 10 08:28:03 2016 >> >> Last change: Thu Mar 10 08:26:14 2016 >> >> Stack: cman >> >> Current DC: ga2.idam.com - partition with quorum >> >> Version: 1.1.11-97629de >> >> 2 Nodes configured >> >> 3 Resources configured >> >> >> >> >> >> Online: [ ga1.idam.com ga2.idam.com ] >> >> >> >> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga1.idam.com >> >> Clone Set: dirsrv-daemon-clone [dirsrv-daemon] >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga2.idam.com >> (unmanaged) >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga1.idam.com >> (unmanaged) >> >> >> >> >> >> > ____________________________________________________________________________ >> ____________ >> >> >> >> Stop dirsrv on ga1: >> >> >> >> Last updated: Thu Mar 10 08:28:43 2016 >> >> Last change: Thu Mar 10 08:26:14 2016 >> >> Stack: cman >> >> Current DC: ga2.idam.com - partition with quorum >> >> Version: 1.1.11-97629de >> >> 2 Nodes configured >> >> 3 Resources configured >> >> >> >> >> >> Online: [ ga1.idam.com ga2.idam.com ] >> >> >> >> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com >> >> Clone Set: dirsrv-daemon-clone [dirsrv-daemon] >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga2.idam.com >> (unmanaged) >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED > ga1.idam.com >> (unmanaged) >> >> >> >> Failed actions: >> >> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): > call=12, >> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, >> exec=0ms >> >> >> >> IP fails over to ga2 OK: >> >> >> >> > ____________________________________________________________________________ >> ____________ >> >> >> >> Restart dirsrv on ga1 >> >> >> >> Last updated: Thu Mar 10 08:30:01 2016 >> >> Last change: Thu Mar 10 08:26:14 2016 >> >> Stack: cman >> >> Current DC: ga2.idam.com - partition with quorum >> >> Version: 1.1.11-97629de >> >> 2 Nodes configured >> >> 3 Resources configured >> >> >> >> >> >> Online: [ ga1.idam.com ga2.idam.com ] >> >> >> >> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com >> >> Clone Set: dirsrv-daemon-clone [dirsrv-daemon] >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga2.idam.com >> (unmanaged) >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga1.idam.com >> (unmanaged) >> >> >> >> Failed actions: >> >> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): > call=12, >> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, >> exec=0ms >> >> >> >> > ____________________________________________________________________________ >> ____________ >> >> >> >> Stop dirsrv on ga2: >> >> >> >> Last updated: Thu Mar 10 08:31:14 2016 >> >> Last change: Thu Mar 10 08:26:14 2016 >> >> Stack: cman >> >> Current DC: ga2.idam.com - partition with quorum >> >> Version: 1.1.11-97629de >> >> 2 Nodes configured >> >> 3 Resources configured >> >> >> >> >> >> Online: [ ga1.idam.com ga2.idam.com ] >> >> >> >> dirsrv-ip (ocf::heartbeat:IPaddr2): Started ga2.idam.com >> >> Clone Set: dirsrv-daemon-clone [dirsrv-daemon] >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): FAILED > ga2.idam.com >> (unmanaged) >> >> dirsrv-daemon (ocf::heartbeat:dirsrv): Started > ga1.idam.com >> (unmanaged) >> >> >> >> Failed actions: >> >> dirsrv-daemon_monitor_10000 on ga2.idam.com 'not running' (7): > call=11, >> status=complete, last-rc-change='Thu Mar 10 08:31:12 2016', queued=0ms, >> exec=0ms >> >> dirsrv-daemon_monitor_10000 on ga1.idam.com 'not running' (7): > call=12, >> status=complete, last-rc-change='Thu Mar 10 08:28:41 2016', queued=0ms, >> exec=0ms >> >> >> >> But IP stays on failed node >> >> Looking in the logs it seems that the cluster is not aware that ga1 is >> available even though the status output shows it is. >> >> >> >> If I repeat the tests but with ga2 started up first the behaviour is > similar >> i.e. it fails over to ga1 but not back to ga2. >> >> >> >> Many thanks, >> >> Bernie > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org