Actually, there was (also?) a bug here causing re-probe loops. Fix in: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/3df83ce5c974
On Wed, Nov 19, 2008 at 14:25, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > My suspicion here is that the RA is messing up the monitoring action. > I'd suggest trying with just one of the drbd clones and see if that works. > > On Wed, Nov 12, 2008 at 13:19, Raoul Bhatia [IPAX] <[EMAIL PROTECTED]> wrote: >> hi, >> >> i have a cluster with several resources. >> >> i issued crm_resource -P and now have got the cluster in some strange >> state, which it cannot resolve by itself: >> >>> Node: wc01 (31de4ab3-2d05-476e-8f9a-627ad6cd94ca): standby >>> Node: wc02 (f36760d8-d84a-46b2-b452-4c8cac8b3396): standby >> ... >>> Master/Slave Set: ms_drbd_www >>> drbd_www:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >>> drbd_www:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >> ... >>> Master/Slave Set: ms_drbd_mysql >>> drbd_mysql:0 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >>> drbd_mysql:1 (ocf::heartbeat:drbd) Master [ wc01 wc02 ] >> >> failed actions: >>> Failed actions: >>> drbd_www:1_monitor_0 (node=wc02, call=13666, rc=0): complete >>> drbd_www:0_monitor_0 (node=wc02, call=13665, rc=0): complete >>> drbd_mysql:1_monitor_0 (node=wc02, call=13672, rc=0): complete >>> drbd_mysql:0_monitor_0 (node=wc02, call=13671, rc=0): complete >> >> those monitoring failures repeat continouesly. in the logfiles i find: >> ... >>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 16 >>> (drbd_www:0_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error >>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: >>> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, >>> id=drbd_www:0_monitor_0, >>> magic=0:0;16:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed >>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort >>> priority upgraded from 0 to 1 >>> crmd[14105]: 2008/11/12_13:14:19 info: update_abort_priority: Abort action >>> done superceeded by restart >>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action >>> drbd_www:0_monitor_0 (16) confirmed on wc02 (rc=4) >>> crmd[14105]: 2008/11/12_13:14:19 WARN: status_from_rc: Action 17 >>> (drbd_www:1_monitor_0) on wc02 failed (target: 8 vs. rc: 0): Error >>> crmd[14105]: 2008/11/12_13:14:19 info: abort_transition_graph: >>> __FUNCTION__:385 - Triggered transition abort (complete=0, tag=lrm_rsc_op, >>> id=drbd_www:1_monitor_0, >>> magic=0:0;17:670:8:d3f15030-d3f0-421d-a477-ce19a2cae321) : Event failed >>> crmd[14105]: 2008/11/12_13:14:19 info: match_graph_event: Action >>> drbd_www:1_monitor_0 (17) confirmed on wc02 (rc=4) >> ... >> >> i put some debug information into the drbd ocf ra: >>> #!/bin/sh >>> echo "----" >> /tmp/lalala >> >> but /tmp/lalala stays emtpy. if i manually call the drbd ra with >> all parameters i get the expected rc 8. >> >> hb_report http://ip52.ipax.at/~raoul/cluster/no_monitor_action.tar.gz >> (its kinda big as a lot of actions failed) >> >> cheers, >> raoul >> >> ps: i allready tried to revoke the crm_standby, but this does not >> resolve the error messages and does not call the drbd ocf ra. >> -- >> ____________________________________________________________________ >> DI (FH) Raoul Bhatia M.Sc. email. [EMAIL PROTECTED] >> Technischer Leiter >> >> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at >> Barawitzkagasse 10/2/2/11 email. [EMAIL PROTECTED] >> 1190 Wien tel. +43 1 3670030 >> FN 277995t HG Wien fax. +43 1 3670030 15 >> ____________________________________________________________________ >> >> _______________________________________________ >> Pacemaker mailing list >> [email protected] >> http://list.clusterlabs.org/mailman/listinfo/pacemaker >> > _______________________________________________ Pacemaker mailing list [email protected] http://list.clusterlabs.org/mailman/listinfo/pacemaker
