I have a much clearer idea of the problem you're seeing now, thankyou. Could you attach /var/lib/pacemaker/pengine/pe-input-1.bz2 from CSE-1 ?
On 03/05/2013, at 10:40 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote: > Hi, > > Below you can see my setup and my test, this shows that my cloned resource > with on-fail=block does not recover automatically. > > My Setup: > > # rpm -aq | grep -i pacemaker > pacemaker-libs-1.1.9-1512.el6.i686 > pacemaker-cluster-libs-1.1.9-1512.el6.i686 > pacemaker-cli-1.1.9-1512.el6.i686 > pacemaker-1.1.9-1512.el6.i686 > > # crm configure show > node CSE-1 > node CSE-2 > primitive d_tomcat ocf:ntc:tomcat \ > op monitor interval="15s" timeout="510s" on-fail="block" \ > op start interval="0" timeout="510s" \ > params instance_name="NMS" monitor_use_ssl="no" monitor_urls="/cse/health" > monitor_timeout="120" \ > meta migration-threshold="1" > primitive ip_11 ocf:heartbeat:IPaddr2 \ > op monitor interval="10s" \ > params broadcast="172.16.11.31" ip="172.16.11.31" nic="bond0.111" > iflabel="ha" \ > meta migration-threshold="1" failure-timeout="10" > primitive ip_19 ocf:heartbeat:IPaddr2 \ > op monitor interval="10s" \ > params broadcast="172.18.19.31" ip="172.18.19.31" nic="bond0.119" > iflabel="ha" \ > meta migration-threshold="1" failure-timeout="10" > group svc-cse ip_19 ip_11 > clone cl_tomcat d_tomcat > colocation colo_tomcat inf: svc-cse cl_tomcat > order order_tomcat inf: cl_tomcat svc-cse > property $id="cib-bootstrap-options" \ > dc-version="1.1.9-1512.el6-2a917dd" \ > cluster-infrastructure="cman" \ > pe-warn-series-max="9" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > pe-input-series-max="9" \ > pe-error-series-max="9" \ > last-lrm-refresh="1367582088" > > Currently only 1 node is available, CSE-1. > > > This is how I am currently testing my setup: > > => Starting point: Everything up and running > > # crm resource status > Resource Group: svc-cse > ip_19 (ocf::heartbeat:IPaddr2): Started > ip_11 (ocf::heartbeat:IPaddr2): Started > Clone Set: cl_tomcat [d_tomcat] > Started: [ CSE-1 ] > Stopped: [ d_tomcat:1 ] > > => Causing failure: Change system so tomcat is running but has a failure (in > attachment step_2.log) > > # crm resource status > Resource Group: svc-cse > ip_19 (ocf::heartbeat:IPaddr2): Stopped > ip_11 (ocf::heartbeat:IPaddr2): Stopped > Clone Set: cl_tomcat [d_tomcat] > d_tomcat:0 (ocf::ntc:tomcat): Started (unmanaged) FAILED > Stopped: [ d_tomcat:1 ] > > => Fixing failure: Revert system so tomcat is running without failure (in > attachment step_3.log) > > # crm resource status > Resource Group: svc-cse > ip_19 (ocf::heartbeat:IPaddr2): Stopped > ip_11 (ocf::heartbeat:IPaddr2): Stopped > Clone Set: cl_tomcat [d_tomcat] > d_tomcat:0 (ocf::ntc:tomcat): Started (unmanaged) FAILED > Stopped: [ d_tomcat:1 ] > > As you can see in the logs the OCF script doesn't return any failure. This is > noticed by pacemaker, > however it doesn't reflect in crm_mon and it doesn't start the depending > resources. > > Gr. > Johan > > On 2013-05-03 03:04, Andrew Beekhof wrote: >> On 02/05/2013, at 5:45 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote: >> >>> On 2013-05-01 05:48, Andrew Beekhof wrote: >>>> On 17/04/2013, at 9:54 PM, Johan Huysmans <johan.huysm...@inuits.be> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I'm trying to setup a specific configuration in our cluster, however I'm >>>>> struggling with my configuration. >>>>> >>>>> This is what I'm trying to achieve: >>>>> On both nodes of the cluster a daemon must be running (tomcat). >>>>> Some failover addresses are configured and must be running on the node >>>>> with a correctly running tomcat. >>>>> >>>>> I have this achieved with a cloned tomcat resource and an collocation >>>>> between the cloned tomcat and the failover addresses. >>>>> When I cause a failure in the tomcat on the node running the failover >>>>> addresses, the failover addresses will failover to the other node as >>>>> expected. >>>>> crm_mon shows that this tomcat has a failure. >>>>> When I configure the tomcat resource with failure-timeout=0, the failure >>>>> alarm in crm_mon isn't cleared whenever the tomcat failure is fixed. >>>> All sounds right so far. >>> If my broken tomcat is automatically fixed, I expect this to be noticed by >>> pacemaker and that that node will be able to run my failover addresses, >>> however I don't see this happening. >> This is very hard to discuss without seeing logs. >> >> So you created a tomcat error, waited for pacemaker to notice, fixed the >> error and observed the pacemaker did not re-notice? >> How long did you wait? More than the 15s repeat interval I assume? Did at >> least the resource agent notice? >> >>>>> When I configure the tomcat resource with failure-timeout=30, the failure >>>>> alarm in crm_mon is cleared after 30seconds however the tomcat is still >>>>> having a failure. >>>> Can you define "still having a failure"? >>>> You mean it still shows up in crm_mon? >>>> Have you read this link? >>>> >>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-rules-recheck.html >>> "Still having a failure" means that the tomcat is still broken and my OCF >>> script reports it as a failure. >>>>> What I expect is that pacemaker reports the failure as the failure exists >>>>> and as long as it exists and that pacemaker reports that everything is ok >>>>> once everything is back ok. >>>>> >>>>> Do I do something wrong with my configuration? >>>>> Or how can I achieve my wanted setup? >>>>> >>>>> Here is my configuration: >>>>> >>>>> node CSE-1 >>>>> node CSE-2 >>>>> primitive d_tomcat ocf:custom:tomcat \ >>>>> op monitor interval="15s" timeout="510s" on-fail="block" \ >>>>> op start interval="0" timeout="510s" \ >>>>> params instance_name="NMS" monitor_use_ssl="no" >>>>> monitor_urls="/cse/health" monitor_timeout="120" \ >>>>> meta migration-threshold="1" failure-timeout="0" >>>>> primitive ip_1 ocf:heartbeat:IPaddr2 \ >>>>> op monitor interval="10s" \ >>>>> params nic="bond0" broadcast="10.1.1.1" iflabel="ha" ip="10.1.1.1" >>>>> primitive ip_2 ocf:heartbeat:IPaddr2 \ >>>>> op monitor interval="10s" \ >>>>> params nic="bond0" broadcast="10.1.1.2" iflabel="ha" ip="10.1.1.2" >>>>> group svc-cse ip_1 ip_2 >>>>> clone cl_tomcat d_tomcat >>>>> colocation colo_tomcat inf: svc-cse cl_tomcat >>>>> order order_tomcat inf: cl_tomcat svc-cse >>>>> property $id="cib-bootstrap-options" \ >>>>> dc-version="1.1.8-7.el6-394e906" \ >>>>> cluster-infrastructure="cman" \ >>>>> no-quorum-policy="ignore" \ >>>>> stonith-enabled="false" >>>>> >>>>> Thanks! >>>>> >>>>> Greetings, >>>>> Johan Huysmans >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > <step_2.log><step_3.log>_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org