Please ignore some of my assumptions are wrong found some more info :)
> -----Original Message----- > From: Alex Samad - Yieldbroker [mailto:alex.sa...@yieldbroker.com] > Sent: Wednesday, 9 July 2014 6:11 PM > To: pacemaker@oss.clusterlabs.org > Subject: [Pacemaker] Help with config please > > Hi > > Config pacemaker on centos 6.5 > pacemaker-cli-1.1.10-14.el6_5.3.x86_64 > pacemaker-1.1.10-14.el6_5.3.x86_64 > pacemaker-libs-1.1.10-14.el6_5.3.x86_64 > pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64 > > this is my config > Cluster Name: ybrp > Corosync Nodes: > > Pacemaker Nodes: > devrp1 devrp2 > > Resources: > Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0 > clusterip_hash=sourceip-sourceport > Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s > Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpip- > monitor-interval-5s) > Clone: ybrpstat-clone > Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1 > Resource: ybrpstat (class=ocf provider=yb type=proxy) > Operations: monitor on-fail=restart interval=5s timeout=20s (ybrpstat- > monitor-interval-5s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > start ybrpstat-clone then start ybrpip (Mandatory) (id:order-ybrpstat-clone- > ybrpip-mandatory) > Colocation Constraints: > ybrpip with ybrpstat-clone (INFINITY) (id:colocation-ybrpip-ybrpstat-clone- > INFINITY) > > Cluster Properties: > cluster-infrastructure: cman > dc-version: 1.1.10-14.el6_5.3-368c726 > last-lrm-refresh: 1404892739 > no-quorum-policy: ignore > stonith-enabled: false > > > I have my own resource file and I start stop the proxy service outside of > pacemaker! > > I had an interesting problem, where I did a vmware update on the linux box, > which interrupted network activity. > > Part of my monitor function on my script is to 1) test if the proxy process is > running, 2) get a status page from the proxy and confirm it is 200 > > > This is what I got in /var/log/messages > > Jul 9 06:16:13 devrp1 crmd[6849]: warning: update_failcount: Updating > failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update > =value++, time=1404850573) > Jul 9 06:16:13 devrp1 crmd[6849]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_ > INTERNAL origin=abort_transition_graph ] > Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op monitor for ybrpstat:0 on devrp2: not running (7) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op start for ybrpstat:1 on devrp1: unknown error (1) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma > x=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma > x=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart > ybrpip#011(Started devrp2) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover > ybrpstat:0#011(Started devrp2) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: > Calculated Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2 > Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op monitor for ybrpstat:0 on devrp2: not running (7) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op start for ybrpstat:1 on devrp1: unknown error (1) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures > (max=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures > (max=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart > ybrpip#011(Started devrp2) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover > ybrpstat:0#011(Started devrp2) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: > Calculated Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2 > Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op monitor for ybrpstat:0 on devrp2: not running (7) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing > failed op start for ybrpstat:1 on devrp1: unknown error (1) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures > (max=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness: > Forcing ybrpstat-clone away from devrp1 after 1000000 failures > (max=1000000) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart > ybrpip#011(Started devrp2) > Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover > ybrpstat:0#011(Started devrp2) > > > And it stay this way for the next 12 hours, until I got on. > > I poked around and to fix it I ran this > /usr/sbin/pcs resource cleanup ybrpip > /usr/sbin/pcs resource cleanup ybrpstat > > Bascially I cleaned up the errors and off it went all by itself. > > So my question is how do I configure it or what do I need to change in the > resource script file to send a temp error back to pacemaker so that it should > have kept trying to check the status of proxy ? > > It seems to me it tried once and then failed... although the log says filed > after > 1000000 failures .... how can I change that to infinite and where is the > interval setting for this, cause in the config above it looks to me like it > should > be infinite ? > > > Thanks > Alex > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org