Thanks for the explanation, I think you're right that we shouldn't be showing these failed actions. I think we want to do it in the PE though, eg. stop them from making it into the failed_ops list in the first place.
On Mon, Sep 13, 2010 at 10:37 AM, <renayama19661...@ybb.ne.jp> wrote: > Hi Andrew, > > Thank you for comment. > >> I assume this is for the stonith-enabled=true case, since offline >> nodes are ignored for stonith-enabled=false. >> Once the node is shot, then its status section is erased and no failed >> actions will be shown... so why do we need this patch? > > I know that trouble information disappears when I succeeded in shooting a > node. > In addition, in the case of stonith-enabled=false, I know that it is not > displayed if a node becomes > the offline. > > (snip) > if(this_node->details->online || is_set(data_set->flags, > pe_flag_stonith_enabled)) { > /* offline nodes run no resources... > * unless stonith is enabled in which case we need to > * make sure rsc start events happen after the > stonith > */ > crm_debug_3("Processing lrm resource entries"); > unpack_lrm_resources(this_node, lrm_rsc, data_set); > } > ); > (snip) > > But, the failed action information is displayed in crm_mon though a node is > shutdown when it is not > necessary to shoot a node. > (The failed count of times disappears then, but the failed action stays.) > > # srv01 was monitor error. > > Migration summary: > * Node srv04: > * Node srv02: > * Node srv01: > prmApPostgreSQLDB1: migration-threshold=1 fail-count=1 > * Node srv03: > > Failed actions: > prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, > status=complete): not running > > # Next....srv01 was service stop. > > Migration summary: ---> The failed count of srv01 disappears > * Node srv04: > * Node srv02: > * Node srv03: > > Failed actions: ---> The failed action stays > prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, > status=complete): not running > > Our user does not expect the trouble information of the node that stopped > normally. > > In the case of stonith-enabled=true, should the node that trouble happened > display failed action > information till it is shot? > When the trouble information of the node that stopped normally is displayed, > is not the user confused? > > Best Regards, > Hideo Yamauchi. > > --- Andrew Beekhof <and...@beekhof.net> wrote: > >> 2010/9/13 <renayama19661...@ybb.ne.jp>: >> > Hi, >> > >> > I contribute the patch of the crm_mon command. >> > >> > A node was offline and, in the case of the shutdown, revised it not to >> > display a trouble >> action. >> > >> > Please confirm a patch. >> > And, without a problem, please take this revision in a development version. >> >> Hmmm. >> I'm not sure about this patch. >> >> I assume this is for the stonith-enabled=true case, since offline >> nodes are ignored for stonith-enabled=false. >> Once the node is shot, then its status section is erased and no failed >> actions will be shown... so why do we need this patch? >> >> > >> > >> > diff -r 9b95463fde99 tools/crm_mon.c >> > --- a/tools/crm_mon.c � Mon Sep 13 13:07:16 2010 +0900 >> > +++ b/tools/crm_mon.c � Mon Sep 13 13:07:59 2010 +0900 >> > @@ -829,6 +829,7 @@ >> > � � int configured_resources = 0; >> > � � int print_opts = pe_print_ncurses; >> > � � const char *quorum_votes = "unknown"; >> > + � �gboolean is_failed_first_disp = TRUE; >> > >> > � � if(as_console) { >> > � � � �blank_screen(); >> > @@ -989,16 +990,28 @@ >> > � � } >> > >> > � � if(xml_has_children(data_set->failed)) { >> > - � � � print_as("\nFailed actions:\n"); >> > � � � �xml_child_iter(data_set->failed, xml_op, >> > � � � � � � � � >> > � � � > int val = 0; >> > + � � � � � � � � >> > � � > �node_t *failed_node = NULL; >> > � � � � � � � � >> > � � � > const char *id = ID(xml_op); >> > � � � � � � � � >> > � � � > const char *last = crm_element_value(xml_op, "last_run"); >> > � � � � � � � � >> > � � � > const char *node = crm_element_value(xml_op, XML_ATTR_UNAME); >> > � � � � � � � � >> > � � � > const char *call = crm_element_value(xml_op, XML_LRM_ATTR_CALLID); >> > � � � � � � � � >> > � � � > const char *rc � = crm_element_value(xml_op, XML_LRM_ATTR_RC); >> > � � � � � � � � >> > � � � > const char *status = crm_element_value(xml_op, XML_LRM_ATTR_OPSTATUS); >> > - >> > + >> > + � � � � � � � � >> > � � > �failed_node = pe_find_node(data_set->nodes, node); >> > + � � � � � � � � >> > � � > � if (failed_node != NULL) { >> > + � � � � � � � � >> > � � > � � �if ((failed_node->details->shutdown == TRUE) && >> (failed_node->details->online == >> > FALSE)) { >> > + � � � � � � � � >> > � � > � � � � �continue; >> > + � � � � � � � � >> > � � > � � �} >> > + � � � � � � � � >> > � � > �} >> > + >> > + � � � � � � � � >> > � � > �if (is_failed_first_disp){ >> > + � � � � � � � � >> > � � > � � �is_failed_first_disp = FALSE; >> > + � � � � � � � � >> > � � > � � �print_as("\nFailed actions:\n"); >> > + � � � � � � � � >> > � � > �} >> > + >> > � � � � � � � � >> > � � � > val = crm_parse_int(status, "0"); >> > � � � � � � � � >> > � � � > print_as(" � �%s (node=%s, call=%s, rc=%s, status=%s", >> > � � � � � � � � >> > � � � > � � � � �id, node, call, rc, > op_status2text(val)); >> > >> > >> > >> > Best Regards, >> > Hideo Yamauchi. >> > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: >> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >> > >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker