Thanks for the explanation, I think you're right that we shouldn't be
showing these failed actions.
I think we want to do it in the PE though, eg. stop them from making
it into the failed_ops list in the first place.

On Mon, Sep 13, 2010 at 10:37 AM,  <renayama19661...@ybb.ne.jp> wrote:
> Hi Andrew,
>
> Thank you for comment.
>
>> I assume this is for the stonith-enabled=true case, since offline
>> nodes are ignored for stonith-enabled=false.
>> Once the node is shot, then its status section is erased and no failed
>> actions will be shown... so why do we need this patch?
>
> I know that trouble information disappears when I succeeded in shooting a 
> node.
> In addition, in the case of stonith-enabled=false, I know that it is not 
> displayed if a node becomes
> the offline.
>
> (snip)
>                if(this_node->details->online || is_set(data_set->flags, 
> pe_flag_stonith_enabled)) {
>                        /* offline nodes run no resources...
>                         * unless stonith is enabled in which case we need to
>                         *   make sure rsc start events happen after the 
> stonith
>                         */
>                        crm_debug_3("Processing lrm resource entries");
>                        unpack_lrm_resources(this_node, lrm_rsc, data_set);
>                }
>                );
> (snip)
>
> But, the failed action information is displayed in crm_mon though a node is 
> shutdown when it is not
> necessary to shoot a node.
> (The failed count of times disappears then, but the failed action stays.)
>
>  # srv01 was monitor error.
>
> Migration summary:
> * Node srv04:
> * Node srv02:
> * Node srv01:
>   prmApPostgreSQLDB1: migration-threshold=1 fail-count=1
> * Node srv03:
>
> Failed actions:
>    prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, 
> status=complete): not running
>
>  # Next....srv01 was service stop.
>
> Migration summary: ---> The failed count of srv01 disappears
> * Node srv04:
> * Node srv02:
> * Node srv03:
>
> Failed actions: ---> The failed action stays
>    prmApPostgreSQLDB1_monitor_10000 (node=srv01, call=81, rc=7, 
> status=complete): not running
>
> Our user does not expect the trouble information of the node that stopped 
> normally.
>
> In the case of stonith-enabled=true, should the node that trouble happened 
> display failed action
> information till it is shot?
> When the trouble information of the node that stopped normally is displayed, 
> is not the user confused?
>
> Best Regards,
> Hideo Yamauchi.
>
> --- Andrew Beekhof <and...@beekhof.net> wrote:
>
>> 2010/9/13  <renayama19661...@ybb.ne.jp>:
>> > Hi,
>> >
>> > I contribute the patch of the crm_mon command.
>> >
>> > A node was offline and, in the case of the shutdown, revised it not to 
>> > display a trouble
>> action.
>> >
>> > Please confirm a patch.
>> > And, without a problem, please take this revision in a development version.
>>
>> Hmmm.
>> I'm not sure about this patch.
>>
>> I assume this is for the stonith-enabled=true case, since offline
>> nodes are ignored for stonith-enabled=false.
>> Once the node is shot, then its status section is erased and no failed
>> actions will be shown... so why do we need this patch?
>>
>> >
>> >
>> > diff -r 9b95463fde99 tools/crm_mon.c
>> > --- a/tools/crm_mon.c &#65533; Mon Sep 13 13:07:16 2010 +0900
>> > +++ b/tools/crm_mon.c &#65533; Mon Sep 13 13:07:59 2010 +0900
>> > @@ -829,6 +829,7 @@
>> > &#65533; &#65533; int configured_resources = 0;
>> > &#65533; &#65533; int print_opts = pe_print_ncurses;
>> > &#65533; &#65533; const char *quorum_votes = "unknown";
>> > + &#65533; &#65533;gboolean is_failed_first_disp = TRUE;
>> >
>> > &#65533; &#65533; if(as_console) {
>> > &#65533; &#65533; &#65533; &#65533;blank_screen();
>> > @@ -989,16 +990,28 @@
>> > &#65533; &#65533; }
>> >
>> > &#65533; &#65533; if(xml_has_children(data_set->failed)) {
>> > - &#65533; &#65533; &#65533; print_as("\nFailed actions:\n");
>> > &#65533; &#65533; &#65533; &#65533;xml_child_iter(data_set->failed, xml_op,
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> int val = 0;
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533;node_t *failed_node = NULL;
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *id = ID(xml_op);
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *last = crm_element_value(xml_op, "last_run");
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *node = crm_element_value(xml_op, XML_ATTR_UNAME);
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *call = crm_element_value(xml_op, XML_LRM_ATTR_CALLID);
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *rc &#65533; = crm_element_value(xml_op, XML_LRM_ATTR_RC);
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> const char *status = crm_element_value(xml_op, XML_LRM_ATTR_OPSTATUS);
>> > -
>> > +
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533;failed_node = pe_find_node(data_set->nodes, node);
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; if (failed_node != NULL) {
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; &#65533; &#65533;if ((failed_node->details->shutdown == TRUE) &&
>> (failed_node->details->online ==
>> > FALSE)) {
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533;continue;
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; &#65533; &#65533;}
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533;}
>> > +
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533;if (is_failed_first_disp){
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; &#65533; &#65533;is_failed_first_disp = FALSE;
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533; &#65533; &#65533;print_as("\nFailed actions:\n");
>> > + &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533;
> &#65533;}
>> > +
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> val = crm_parse_int(status, "0");
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> print_as(" &#65533; &#65533;%s (node=%s, call=%s, rc=%s, status=%s",
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533;id, node, call, rc, 
> op_status2text(val));
>> >
>> >
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: 
>> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to