Hi Andrew, > Pushed as: > http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18 > > Not sure about applying to 1.0 though, its a dramatic change in behavior.
The change of this link is not found. Where did you update it? Best Regards, Hideo Yamauchi. --- Andrew Beekhof <and...@beekhof.net> wrote: > Pushed as: > http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18 > > Not sure about applying to 1.0 though, its a dramatic change in behavior. > > On Wed, Sep 22, 2010 at 11:18 AM, <renayama19661...@ybb.ne.jp> wrote: > > Hi Andrew, > > > > Thank you for comment. > > > >> A long time ago in a galaxy far away, some messaging layers used to > >> loose quite a few actions, including stops. > >> About the same time, we decided that fencing because a stop action was > >> lost wasn't a good idea. > >> > >> The rationale was that if the operation eventually completed, it would > >> end up in the CIB anyway. > >> And even if it didn't, the PE would continue to try the operation > >> again until the whole node fell over at which point it would get shot > >> anyway. > > > > Sorry... > > I did not know the fact that there was such an argument in old days. > > > > > >> Now, having said that, things have improved since then and perhaps, > >> the interest of speeding up recovery in these situations, it is time > >> to stop treating stop operations differently. > >> Would you agree? > > > > That means, you change it in the case of "Action Lost" of the stop this > > time to carry out > stonith? > > If my recognition is right, I agree too. > > > > if(timer->action->type != action_type_rsc) { > > send_update = FALSE; > > } else if(safe_str_eq(task, "cancel")) { > > /* we dont need to update the CIB with these */ > > send_update = FALSE; > > } > > ---> delete "else if(safe_str_eq(task, "stop")){..}" ? > > > > if(send_update) { > > /* cib_action_update(timer->action, LRM_OP_PENDING, EXECRA_STATUS_UNKNOWN); > > */ > > cib_action_update(timer->action, LRM_OP_TIMEOUT, EXECRA_UNKNOWN_ERROR); > > } > > > > Best Regards, > > Hideo Yamauchi. > > > > --- Andrew Beekhof <and...@beekhof.net> wrote: > > > >> On Tue, Sep 21, 2010 at 8:59 AM, �<renayama19661...@ybb.ne.jp> > >> wrote: > >> > Hi, > >> > > >> > Node was in state that the load was very high, and we confirmed monitor > >> > movement of > Pacemeker. > >> > Action Lost occurred in stop movement after the error of the monitor > >> > occurred. > >> > > >> > Sep �8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting > >> > transition, action > lost: > >> [Action 9]: > >> > In-flight (id: prmApPostgreSQLDB1_stop_0, loc: cgl49, priority: 0) > >> > Sep �8 20:02:22 cgl54 crmd: [3507]: info: abort_transition_graph: > action_timer_callback:486 > > - > >> > Triggered transition abort (complete=0) : Action lost > >> > > >> > > >> > For the load of the node, We think that the stop movement did not go > >> > well. > >> > But cannot nodes execute stonith. > >> > >> A long time ago in a galaxy far away, some messaging layers used to > >> loose quite a few actions, including stops. > >> About the same time, we decided that fencing because a stop action was > >> lost wasn't a good idea. > >> > >> The rationale was that if the operation eventually completed, it would > >> end up in the CIB anyway. > >> And even if it didn't, the PE would continue to try the operation > >> again until the whole node fell over at which point it would get shot > >> anyway. > >> > >> Now, having said that, things have improved since then and perhaps, > >> the interest of speeding up recovery in these situations, it is time > >> to stop treating stop operations differently. > >> Would you agree? > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: > >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >> > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker