Hi, On Wed, Jan 12, 2011 at 02:41:31PM -0700, Patrick H. wrote: > > >>Oh, and its not waiting for the resource to stop on the other > >>node before it starts it up either. > >>Here's the lrmd log for resource vip_55.63 from the 'ha02' node > >>(the node I put into standby) > >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop > >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop > >>process 19063 exited with return code 0. > >> > >> > >>And here's the lrmd log for the same resource on 'ha01' > >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start > >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start > >>process 8826 exited with return code 0. > >> > >> > >>Notice that it stopped it a full 36 seconds before it tried to > >>start it on the other node. The times on both boxes are in > >>sync, so its not that either. > > > >Is this the case when you wanted to fail-over a single resource > >or was it part of the node standby process? > > > >Thanks, > > > >Dejan > In that case I put the node in standby. > > > While digging around a bit more, I noticed this: > Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating > action 966: stop vip_55.236_stop_0 on ha01 (local) > Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing > key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667 > op=vip_55.236_stop_0 ) > Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop > Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop > process 11414 exited with return code 0. > Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM > operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621, > confirmed=true) ok > Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action > vip_55.236_stop_0 (966) confirmed on ha01 (rc=0) > Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating > action 967: start vip_55.236_start_0 on ha02 > Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action > vip_55.236_start_0 (967) confirmed on ha02 (rc=0) > > Notice the huge delays before the match_graph_event on both stop and > start. So it seems everything is waiting on match_graph_event. What > is this?
Can't say, but perhaps Andrew would know, though I'm not sure if there's enough information here. Best to open a bugzilla and attach hb_report. Thanks, Dejan > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker