On the controller I get an extra "send failed" log message that is ugly:
Apr 3 13:40:01 SC-1 local0.notice osafclmd[417]: NO CLM NodeName: 'PL-6' is not a configured cluster node. Apr 3 13:40:01 SC-1 local0.notice osafclmd[417]: NO /etc/opensaf/node_name should contain the rdn value of configured CLM node object name Apr 3 13:40:02 SC-1 local0.notice osafclmd[417]: NO proc_initialize_msg: send failed. dest:2060f19cd6007 Apr 3 13:40:17 SC-1 local0.notice osafclmd[417]: NO CLM NodeName: 'PL-6' is not a configured cluster node. Apr 3 13:40:17 SC-1 local0.notice osafclmd[417]: NO /etc/opensaf/node_name should contain the rdn value of configured CLM node object name Apr 3 13:40:17 SC-1 local0.notice osafclmd[417]: NO proc_initialize_msg: send failed. dest:2060f19cda006 Apr 3 13:40:32 SC-1 local0.notice osafclmd[417]: NO CLM NodeName: 'PL-6' is not a configured cluster node. Apr 3 13:40:32 SC-1 local0.notice osafclmd[417]: NO /etc/opensaf/node_name should contain the rdn value of configured CLM node object name Apr 3 13:40:33 SC-1 local0.notice osafclmd[417]: NO proc_initialize_msg: send failed. dest:2060f19cda009 Apr 3 13:40:34 SC-1 local0.notice osafimmnd[388]: NO Global discard node received for nodeId:2060f pid:359 Apr 3 13:40:39 SC-1 user.warn kernel: tipc: Resetting link <1.1.1:eth0-1.1.6:eth0>, peer not responding I would like to have a try again loop inside clmna instead. Then we could have the nid to supervise and every now and then do a node reboot. There is not much point in trying three times and give up. I think clmna should retry forever and let nodeinit handle the bigger loop. The try interval needs to be configured but should have good default such as every 30 sec. Thanks, Hans On 03/27/2014 02:13 AM, [email protected] wrote: > osaf/services/saf/clmsv/nodeagent/main.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > > When a node join request for an unconfigured/misconfigured node or > when a node join request with a duplicate node_name is attempted, then > clmna should report those errors to NID such that NID attempts to > respawan clmna. > With the introduction of this change, the following happens(can be seen in > the syslog) in > the case of a unconfigured/misconfigured node join request: > At the ACTIVE controller syslog: > Mar 26 20:23:32 SC-1 local0.notice osafclmd[420]: NO CLM NodeName: 'PL-8' is > not a configured cluster node. > Mar 26 20:23:32 SC-1 local0.notice osafclmd[420]: NO /etc/opensaf/node_name > should contain the rdn value of a configured CLM node object name > > At the unconfigured/misconfigured node, the syslog will be like as below: > Mar 26 19:03:54 PL-3 local0.notice osafclmna[871]: Started > Mar 26 19:03:54 PL-3 local0.err osafclmna[871]: ER PL-8 is not a configured > node > Mar 26 19:03:54 PL-3 local0.err opensafd[837]: ER Failed DESC:CLMNA > Mar 26 19:03:54 PL-3 local0.err opensafd[837]: ER Going for recovery > Mar 26 19:03:54 PL-3 local0.err opensafd[837]: ER Trying To RESPAWN > /usr/local/lib/opensaf/clc-cli/osaf-clmna attempt #1 > Mar 26 19:03:54 PL-3 local0.err opensafd[837]: ER Sending SIGKILL to CLMNA, > pid=868 > Mar 26 19:03:54 PL-3 local0.err osafclmna[871]: ER Exiting > Mar 26 19:03:54 PL-3 local0.notice osafclmna[871]: exiting for shutdown > Mar 26 19:04:09 PL-3 local0.notice osafclmna[895]: Started > Mar 26 19:04:09 PL-3 local0.err osafclmna[895]: ER PL-8 is not a configured > node > Mar 26 19:04:09 PL-3 local0.err osafclmna[895]: ER Exiting > Mar 26 19:04:09 PL-3 local0.err opensafd[837]: ER Could Not RESPAWN CLMNA > Mar 26 19:04:09 PL-3 local0.err opensafd[837]: ER Failed DESC:CLMNA > Mar 26 19:04:09 PL-3 local0.err opensafd[837]: ER Trying To RESPAWN > /usr/local/lib/opensaf/clc-cli/osaf-clmna attempt #2 > Mar 26 19:04:09 PL-3 local0.err opensafd[837]: ER Sending SIGKILL to CLMNA, > pid=892 > Mar 26 19:04:09 PL-3 local0.notice osafclmna[895]: exiting for shutdown > Mar 26 19:04:24 PL-3 local0.notice osafclmna[919]: Started > Mar 26 19:04:25 PL-3 local0.err osafclmna[919]: ER PL-8 is not a configured > node > Mar 26 19:04:25 PL-3 local0.err osafclmna[919]: ER Exiting > Mar 26 19:04:25 PL-3 local0.err opensafd[837]: ER Could Not RESPAWN CLMNA > Mar 26 19:04:25 PL-3 local0.err opensafd[837]: ER Failed DESC:CLMNA > Mar 26 19:04:25 PL-3 local0.err opensafd[837]: ER FAILED TO RESPAWN > Mar 26 19:04:27 PL-3 local0.notice osafclmna[919]: exiting for shutdown > Mar 26 19:04:28 PL-3 local0.notice osafimmnd[864]: exiting for shutdown > > > For cases when a duplicate node join request comes, the following syslog > message will be seen at the ACTIVE controller: > Mar 26 19:07:43 SC-1 local0.err osafclmd[418]: ER Duplicate node join request > for CLM node: 'SC-2'. Specify a unique node name in/etc/opensaf/node_name > Mar 26 19:07:59 SC-1 local0.err osafclmd[418]: ER Duplicate node join request > for CLM node: 'SC-2'. Specify a unique node name in/etc/opensaf/node_name > > And the following will be seen at the node on which the duplicate request is > attempted: > Mar 26 19:07:43 PL-3 local0.err osafclmna[1459]: ER SC-2 is already up. > Specify a unique name in/etc/opensaf/node_name > Mar 26 19:07:43 PL-3 local0.err opensafd[1425]: ER Failed DESC:CLMNA > Mar 26 19:07:43 PL-3 local0.err opensafd[1425]: ER Going for recovery > Mar 26 19:07:43 PL-3 local0.err opensafd[1425]: ER Trying To RESPAWN > /usr/local/lib/opensaf/clc-cli/osaf-clmna attempt #1 > Mar 26 19:07:43 PL-3 local0.err opensafd[1425]: ER Sending SIGKILL to CLMNA, > pid=1456 > Mar 26 19:07:43 PL-3 local0.err osafclmna[1459]: ER Exiting > Mar 26 19:07:44 PL-3 local0.notice osafclmna[1459]: exiting for shutdown > Mar 26 19:07:59 PL-3 local0.notice osafclmna[1483]: Started > Mar 26 19:07:59 PL-3 local0.err osafclmna[1483]: ER SC-2 is already up. > Specify a unique name in/etc/opensaf/node_name > Mar 26 19:07:59 PL-3 local0.err osafclmna[1483]: ER Exiting > Mar 26 19:07:59 PL-3 local0.err opensafd[1425]: ER Could Not RESPAWN CLMNA > Mar 26 19:07:59 PL-3 local0.err opensafd[1425]: ER Failed DESC:CLMNA > Mar 26 19:07:59 PL-3 local0.err opensafd[1425]: ER Trying To RESPAWN > /usr/local/lib/opensaf/clc-cli/osaf-clmna attempt #2 > Mar 26 19:07:59 PL-3 local0.err opensafd[1425]: ER Sending SIGKILL to CLMNA, > pid=1480 > Mar 26 19:07:59 PL-3 local0.notice osafclmna[1483]: exiting for shutdown > > diff --git a/osaf/services/saf/clmsv/nodeagent/main.c > b/osaf/services/saf/clmsv/nodeagent/main.c > --- a/osaf/services/saf/clmsv/nodeagent/main.c > +++ b/osaf/services/saf/clmsv/nodeagent/main.c > @@ -458,11 +458,19 @@ SaAisErrorT clmna_process_dummyup_msg(vo > LOG_ER("%s is not a configured node", > > o_msg->info.api_resp_info.param.node_name.value); > free(o_msg); > + rc = error; /* For now, just pass on the error to nid. > + * This is not needed in future when node > local > + * cluster management policy based > decisions can be made. > + */ > goto done; > } else if (error == SA_AIS_ERR_EXIST) { > LOG_ER("%s is already up. Specify a unique name in" > PKGSYSCONFDIR "/node_name", > > o_msg->info.api_resp_info.param.node_name.value); > free(o_msg); > + rc = error; /* This is not needed in future when node > local > + * cluster management policy based > decisions can be made. > + * For now, just pass on the error to nid. > + */ > goto done; > } > > > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
