[ClusterLabs] pcs remove command doesn't work to remove monitor operations
Hello Community, Need some clarification on recent pacemaker version and some "op" operations (remove/add). Around a year back, I deployed pacemaker cluster using below pacemaker versions and I was/am able to run "pcs resource op remove FSCheck monitor interval=30s" Pacemaker Version: pacemaker-cli-1.1.15-11.el7.x86_64 pacemaker-libs-1.1.15-11.el7.x86_64 pacemaker-cluster-libs-1.1.15-11.el7.x86_64 pacemaker-1.1.15-11.el7.x86_64 But now I need to deploy pacemaker cluster again on some different setup and using upgraded pacemaker version as listed: Pacemaker Version: pacemaker-libs-1.1.16-12.el7.x86_64 pacemaker-cli-1.1.16-12.el7.x86_64 pacemaker-1.1.16-12.el7.x86_64 pacemaker-cluster-libs-1.1.16-12.el7.x86_64 And strangely, pcs op remove command have stopped working and I get the below error: Error: [root@ha2-105 HA7]# pcs resource op remove FSCheck monitor interval=15s Error: Unable to find operation matching: monitor interval=15s Has anything changed in newer version of pacemaker? Any insight will be highly appreciable. Thanks!! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Entire Group stop on stopping of single Resource (Jan Pokorn?)
Thanks, Jaspal Singla On Mon, Aug 22, 2016 at 7:42 PM, <users-requ...@clusterlabs.org> wrote: > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Users digest..." > > > Today's Topics: > >1. Re: Mysql slave did not start replication after failure, and > read-only IP also remained active on the much outdated slave > (Attila Megyeri) >2. Re: Entire Group stop on stopping of single Resource (Jan Pokorn?) >3. Re: Mysql slave did not start replication after failure, and > read-only IP also remained active on the much outdated slave > (Ken Gaillot) > > > -- > > Message: 1 > Date: Mon, 22 Aug 2016 14:24:28 +0200 > From: Attila Megyeri <amegy...@minerva-soft.com> > To: Cluster Labs - All topics related to open-source clustering > welcomed<users@clusterlabs.org> > Subject: Re: [ClusterLabs] Mysql slave did not start replication after > failure, and read-only IP also remained active on the much outdated > slave > Message-ID: > <DA9AC973EEA03848B46F36E07B947E0403E5393E9922@DESRV05. > minerva-soft.local> > > Content-Type: text/plain; charset="utf-8" > > Hi Andrei, > > I waited several hours, and nothing happened. > > I assume that the RA does not treat this case properly. Mysql was running, > but the "show slave status" command returned something that the RA was not > prepared to parse, and instead of reporting a non-readable attribute, it > returned some generic error, that did not stop the server. > > Rgds, > Attila > > > -Original Message- > From: Andrei Borzenkov [mailto:arvidj...@gmail.com] > Sent: Monday, August 22, 2016 11:42 AM > To: Cluster Labs - All topics related to open-source clustering welcomed < > users@clusterlabs.org> > Subject: Re: [ClusterLabs] Mysql slave did not start replication after > failure, and read-only IP also remained active on the much outdated slave > > On Mon, Aug 22, 2016 at 12:18 PM, Attila Megyeri > <amegy...@minerva-soft.com> wrote: > > Dear community, > > > > > > > > A few days ago we had an issue in our Mysql M/S replication cluster. > > > > We have a one R/W Master, and a one RO Slave setup. RO VIP is supposed > to be > > running on the slave if it is not too much behind the master, and if any > > error occurs, RO VIP is moved to the master. > > > > > > > > Something happened with the slave Mysql (some disk issue, still > > investigating), but the problem is, that the slave VIP remained on the > slave > > device, even though the slave process was not running, and the server was > > much outdated. > > > > > > > > During the issue the following log entries appeared (just an extract as > it > > would be too long): > > > > > > > > > > > > Aug 20 02:04:07 ctdb1 corosync[1056]: [MAIN ] Corosync main process > was > > not scheduled for 14088.5488 ms (threshold is 4000. ms). Consider > token > > timeout increase. > > > > Aug 20 02:04:07 ctdb1 corosync[1056]: [TOTEM ] A processor failed, > forming > > new configuration. > > > > Aug 20 02:04:34 ctdb1 corosync[1056]: [MAIN ] Corosync main process > was > > not scheduled for 27065.2559 ms (threshold is 4000. ms). Consider > token > > timeout increase. > > > > Aug 20 02:04:34 ctdb1 corosync[1056]: [TOTEM ] A new membership > (xxx:6720) > > was formed. Members left: 168362243 168362281 168362282 168362301 > 168362302 > > 168362311 168362312 1 > > > > Aug 20 02:04:34 ctdb1 corosync[1056]: [TOTEM ] A new membership > (xxx:6724) > > was formed. Members > > > > .. > > > > Aug 20 02:13:28 ctdb1 corosync[1056]: [MAIN ] Completed service > > synchronization, ready to provide service. > > > > .. > > > > Aug 20 02:13:29 ctdb1 attrd[1584]: notice: attrd_trigger_update: > Sending > > flush op to all hosts for: readable (1) > > > > ? > > > > Aug 20 02:13:32 ctdb1 mysql(db-mysql)[10492]: INFO: post-demote > notification > &g
[ClusterLabs] Entire Group stop on stopping of single Resource
Hello Community, I have an resource group (ctm_service) comprise of various resources. Now the requirement is when one of its resource stops for soem time (10-20) seconds, I want entire group will be stopped. Is it possible to achieve this in pacemaker. Please help! __ Resource Group: ctm_service FSCheck (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py): (target-role:Stopped) Stopped NTW_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py): (target-role:Stopped) Stopped CTM_RSYNC (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py): (target-role:Stopped) Stopped REPL_IF(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): (target-role:Stopped) Stopped ORACLE_REPLICATOR (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py): (target-role:Stopped) Stopped CTM_SID(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): (target-role:Stopped) Stopped CTM_SRV(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): (target-role:Stopped) Stopped CTM_APACHE (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): (target-role:Stopped) Stopped _ This is resource and resource group properties: ___ pcs -f cib.xml.geo resource create FSCheck lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/FsCheckAgent.py op monitor id=FSCheck-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create NTW_IF lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/NtwIFAgent.py op monitor id=NtwIFAgent-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_RSYNC lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/RsyncAgent.py op monitor id=CTM_RSYNC-OP-monitor name=monitor interval=30s on-fail=ignore stop id=CTM_RSYNC-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create REPL_IF lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/ODG_IFAgent.py op monitor id=REPL_IF-OP-monitor name=monitor interval=30 on-fail=ignore stop id=REPL_IF-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create ORACLE_REPLICATOR lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py op monitor id=ORACLE_REPLICATOR-OP-monitor name=monitor interval=30s on-fail=ignore stop id=ORACLE_REPLICATOR-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create CTM_SID lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/OracleAgent.py op monitor id=CTM_SID-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_SRV lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/CtmAgent.py op monitor id=CTM_SRV-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_APACHE lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/ApacheAgent.py op monitor id=CTM_APACHE-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_HEARTBEAT lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/HeartBeat.py op monitor id=CTM_HEARTBEAT-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create FLASHBACK lsb:../../..//cisco/ PrimeOpticalServer/HA/bin/FlashBackMonitor.py op monitor id=FLASHBACK-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource group add ctm_service FSCheck NTW_IF CTM_RSYNC REPL_IF ORACLE_REPLICATOR CTM_SID CTM_SRV CTM_APACHE pcs -f cib.xml.geo resource meta ctm_service migration-threshold=1 failure-timeout=10 target-role=stopped Any help will be highly appreciated! Thanks, Jaspal Singla ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] (no subject)
Hello Community, I have an resource group (ctm_service) comprise of various resources. Now the requirement is when one of its resource stops for soem time (10-20) seconds, I want entire group will be stopped. Is it possible to achieve this in pacemaker. Please help! __ Resource Group: ctm_service FSCheck (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py): (target-role:Stopped) Stopped NTW_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py): (target-role:Stopped) Stopped CTM_RSYNC (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py): (target-role:Stopped) Stopped REPL_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): (target-role:Stopped) Stopped ORACLE_REPLICATOR (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py): (target-role:Stopped) Stopped CTM_SID (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): (target-role:Stopped) Stopped CTM_SRV (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py): (target-role:Stopped) Stopped CTM_APACHE (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): (target-role:Stopped) Stopped _ This is resource and resource group properties: ___ pcs -f cib.xml.geo resource create FSCheck lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py op monitor id=FSCheck-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create NTW_IF lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py op monitor id=NtwIFAgent-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_RSYNC lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py op monitor id=CTM_RSYNC-OP-monitor name=monitor interval=30s on-fail=ignore stop id=CTM_RSYNC-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create REPL_IF lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py op monitor id=REPL_IF-OP-monitor name=monitor interval=30 on-fail=ignore stop id=REPL_IF-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create ORACLE_REPLICATOR lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py op monitor id=ORACLE_REPLICATOR-OP-monitor name=monitor interval=30s on-fail=ignore stop id=ORACLE_REPLICATOR-OP-stop interval=0 on-fail=stop pcs -f cib.xml.geo resource create CTM_SID lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py op monitor id=CTM_SID-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_SRV lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py op monitor id=CTM_SRV-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_APACHE lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py op monitor id=CTM_APACHE-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create CTM_HEARTBEAT lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py op monitor id=CTM_HEARTBEAT-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource create FLASHBACK lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py op monitor id=FLASHBACK-OP-monitor name=monitor interval=30s pcs -f cib.xml.geo resource group add ctm_service FSCheck NTW_IF CTM_RSYNC REPL_IF ORACLE_REPLICATOR CTM_SID CTM_SRV CTM_APACHE pcs -f cib.xml.geo resource meta ctm_service migration-threshold=1 failure-timeout=10 target-role=stopped Any help will be highly appreciated! Thanks, Jaspal Singla ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Freezing/Unfreezing in Pacemaker ?
Hello, As we have clusvcadm -U and clusvcadm -Z to freeze and unfreeze resource in CMAN. Would really appreciate if someone please give some pointers for freezing/unfreezing a resource in Pacemaker (pcs) as well. Thanks, Jaspal Singla ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker
e-ID: <20160130024803.ga27...@redhat.com> > Content-Type: text/plain; charset="utf-8" > > On 27/01/16 19:41 +0100, Jan Pokorn? wrote: > > On 27/01/16 11:04 -0600, Ken Gaillot wrote: > >> On 01/27/2016 02:34 AM, jaspal singla wrote: > >>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter > disables > >>> the start of all services when RGManager starts). Is there any way for > such > >>> behavior in Pacemaker? > > > > Please be more careful about the descriptions; autostart=0 specified > > at the given resource group ("service" or "vm" tag) means just not to > > start anything contained in this very one automatically (also upon > > new resources being defined, IIUIC), definitely not "all services". > > > > [...] > > > >> I don't think there's any exact replacement for autostart in pacemaker. > >> Probably the closest is to set target-role=Stopped before stopping the > >> cluster, and set target-role=Started when services are desired to be > >> started. > > Beside is-managed=false (as currently used in clufter), I also looked > at downright disabling "start" action, but this turned out to be a naive > approach caused by unclear documentation. > > Pushing for a bit more clarity (hopefully): > https://github.com/ClusterLabs/pacemaker/pull/905 > > >>> 2) Please put some alternatives to exclusive=0 and > __independent_subtree? > >>> what we have in Pacemaker instead of these? > > (exclusive property discussed in the other subthread; as a recap, > no extra effort is needed to achieve exclusive=0, exclusive=1 is > currently a show stopper in clufter as neither approach is versatile > enough) > > > For __independent_subtree, each component must be a separate pacemaker > > resource, and the constraints between them would depend on exactly what > > you were trying to accomplish. The key concepts here are ordering > > constraints, colocation constraints, kind=Mandatory/Optional (for > > ordering constraints), and ordered sets. > > Current approach in clufter as of the next branch: > - __independent_subtree=1 -> do nothing special (hardly can be > improved?) > - __independent_subtree=2 -> for that very resource, set operations > as follows: > monitor (interval=60s) on-fail=ignore > stop interval=0 on-fail=stop > > Groups carrying such resources are not unrolled into primitives plus > contraints, as the above might suggest (also default kind=Mandatory > for underlying order constraints should fit well). > > Please holler if this is not sound. > > > So when put together with some other changes/fixes, current > suggested/informative sequence of pcs commands goes like this: > > pcs cluster auth ha1-105.test.com > pcs cluster setup --start --name HA1-105_CLUSTER ha1-105.test.com \ > --consensus 12000 --token 1 --join 60 > sleep 60 > pcs cluster cib tmp-cib.xml --config > pcs -f tmp-cib.xml property set stonith-enabled=false > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-FSCheck \ > lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \ > op monitor interval=30s > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-NTW_IF \ > lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \ > op monitor interval=30s > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-CTM_RSYNC \ > lsb:../../..//data/Product/HA/bin/RsyncAgent.py \ > op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-REPL_IF \ > lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \ > op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-ORACLE_REPLICATOR \ > lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \ > op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-CTM_SID \ > lsb:../../..//data/Product/HA/bin/OracleAgent.py \ > op monitor interval=30s > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-CTM_SRV \ > lsb:../../..//data/Product/HA/bin/CtmAgent.py \ > op monitor interval=30s > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-CTM_APACHE \ > lsb:../../..//data/Product/HA/bin/ApacheAgent.py \ > op monitor interval=30s > pcs -f tmp-cib.xml \ > resource create RESOURCE-script-CTM_HEARTBEAT \ > lsb:../../..//data/Product/HA/bin/HeartBeat.py \ > op monitor inte
[ClusterLabs] Cluster resources migration from CMAN to Pacemaker
Hello Everyone, I desperately need some help in order to migrate my cluster configuration from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1). I have tried to explore a lot but couldn't find similarities configuring same resources (created in CMAN's cluster.conf file) to Pacemaker. I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same thing through Pacemaker. Any help would be greatly appreciable!! *Cluster.conf file* ## <script ref="CTM_SRV"> <script ref="CTM_APACHE"/> ### * Quries/concerns:* -> How can I specifically mentioned above 10 resources through Pacemaker? -> the services being used in section are not init.d services, these services uses script reference of above defined resources. So, how could I do the same thing in Pacemaker? Couple of concerns I have: -> How do I create failover domains in pacemaker and link resources to it? -> By default there are several pre-defined resource API's given in Pacemaker and we can use them if our requirements match with pre-defined API's like IPADDR2, Apache etc. But what if I have some python scripts and want to use those scripts as resources? Is their any way to do that? Please Please help me to get this sorted. Thanks, Jaspal Singla ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org