Re: [Pacemaker] Reset failcount for resources
Thanks Alexandre. Changing the cluster-recheck-interval worked for me :) Regards Arjun On Mon, Nov 17, 2014 at 12:44 PM, Alexandre wrote: > > Le 13 nov. 2014 12:09, "Arjun Pandey" a écrit : > > > > Hi > > > > I am running a 2 node cluster with this config > > > > Master/Slave Set: foo-master [foo] > > Masters: [ bharat ] > > Slaves: [ ram ] > > AC_FLT (ocf::pw:IPaddr): Started bharat > > CR_CP_FLT (ocf::pw:IPaddr): Started bharat > > CR_UP_FLT (ocf::pw:IPaddr): Started bharat > > Mgmt_FLT (ocf::pw:IPaddr): Started bharat > > > > where IPaddr RA is just modified IPAddr2 RA. Additionally i have a > > collocation constraint for the IP addr to be collocated with the master. > > I have set the migration-threshold as 2 for the VIP. I also have set the > failure-timeout to 15s. > > > > > > Initially i bring down the interface on bharat to force switch-over to > ram. After this i fail the interfaces on bharat again. Now i bring the > interface up again on ram. However the virtual IP's are now in stopped > state. > > > > I don't get out of this unless i use crm_resource -C to reset state of > resources. > > However if i check failcount of resources after this it's still set as > INFINITY. > > Based on the documentation the failcount on a node should have expired > after the failure-timeout.That doesn't happen. > > Expiration probably happens, meaning the failure is marked for expiration. > However, expired failures are only removed when the timer pops in, which is > defined by the cluster-recheck-interval (by default 15 mins). > > > However why don't we reset the count after the the crm_resource -C > command too. Any other command to actually reset the failcount. > > > > Thanks in advance > > > > Regards > > Arjun > > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Reset failcount for resources
Le 13 nov. 2014 12:09, "Arjun Pandey" a écrit : > > Hi > > I am running a 2 node cluster with this config > > Master/Slave Set: foo-master [foo] > Masters: [ bharat ] > Slaves: [ ram ] > AC_FLT (ocf::pw:IPaddr): Started bharat > CR_CP_FLT (ocf::pw:IPaddr): Started bharat > CR_UP_FLT (ocf::pw:IPaddr): Started bharat > Mgmt_FLT (ocf::pw:IPaddr): Started bharat > > where IPaddr RA is just modified IPAddr2 RA. Additionally i have a > collocation constraint for the IP addr to be collocated with the master. > I have set the migration-threshold as 2 for the VIP. I also have set the failure-timeout to 15s. > > > Initially i bring down the interface on bharat to force switch-over to ram. After this i fail the interfaces on bharat again. Now i bring the interface up again on ram. However the virtual IP's are now in stopped state. > > I don't get out of this unless i use crm_resource -C to reset state of resources. > However if i check failcount of resources after this it's still set as INFINITY. > Based on the documentation the failcount on a node should have expired after the failure-timeout.That doesn't happen. Expiration probably happens, meaning the failure is marked for expiration. However, expired failures are only removed when the timer pops in, which is defined by the cluster-recheck-interval (by default 15 mins). > However why don't we reset the count after the the crm_resource -C command too. Any other command to actually reset the failcount. > > Thanks in advance > > Regards > Arjun > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Reset failcount for resources
> On 13 Nov 2014, at 10:08 pm, Arjun Pandey wrote: > > Hi > > I am running a 2 node cluster with this config > > Master/Slave Set: foo-master [foo] > Masters: [ bharat ] > Slaves: [ ram ] > AC_FLT (ocf::pw:IPaddr): Started bharat > CR_CP_FLT (ocf::pw:IPaddr): Started bharat > CR_UP_FLT (ocf::pw:IPaddr): Started bharat > Mgmt_FLT (ocf::pw:IPaddr): Started bharat > > where IPaddr RA is just modified IPAddr2 RA. Additionally i have a > collocation constraint for the IP addr to be collocated with the master. > I have set the migration-threshold as 2 for the VIP. I also have set the > failure-timeout to 15s. > > > Initially i bring down the interface on bharat to force switch-over to ram. > After this i fail the interfaces on bharat again. Now i bring the interface > up again on ram. However the virtual IP's are now in stopped state. > > I don't get out of this unless i use crm_resource -C to reset state of > resources. > However if i check failcount of resources after this it's still set as > INFINITY. crm_resource didn't always reset the failcount. I'd encourage you to upgrade your pacemaker packages. > Based on the documentation the failcount on a node should have expired after > the failure-timeout.That doesn't happen. However why don't we reset the count > after the the crm_resource -C command too. Any other command to actually > reset the failcount. There should be 'crm_failcount' that will do this > > Thanks in advance > > Regards > Arjun > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Reset failcount for resources
Hi I am running a 2 node cluster with this config Master/Slave Set: foo-master [foo] Masters: [ bharat ] Slaves: [ ram ] AC_FLT (ocf::pw:IPaddr): Started bharat CR_CP_FLT (ocf::pw:IPaddr): Started bharat CR_UP_FLT (ocf::pw:IPaddr): Started bharat Mgmt_FLT (ocf::pw:IPaddr): Started bharat where IPaddr RA is just modified IPAddr2 RA. Additionally i have a collocation constraint for the IP addr to be collocated with the master. I have set the migration-threshold as 2 for the VIP. I also have set the failure-timeout to 15s. Initially i bring down the interface on bharat to force switch-over to ram. After this i fail the interfaces on bharat again. Now i bring the interface up again on ram. However the virtual IP's are now in stopped state. I don't get out of this unless i use crm_resource -C to reset state of resources. However if i check failcount of resources after this it's still set as INFINITY. Based on the documentation the failcount on a node should have expired after the failure-timeout.That doesn't happen. However why don't we reset the count after the the crm_resource -C command too. Any other command to actually reset the failcount. Thanks in advance Regards Arjun ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org