Hi Santosh,
                If there is only once controller, then configuring escalations 
for node failover is a bad choice. If it happens, node need to reboot as a 
recovery as per Amf Specs. Even if you disable reboot, since the node is not 
functional, it may not proceed without repair and the repair here should be 
node repair before you try anything else, but repair admin command for node is 
not supported.

Thanks
-Nagu

> -----Original Message-----
> From: santosh satapathy [mailto:[email protected]]
> Sent: 26 February 2015 07:12
> To: Mathivanan Naickan Palanivelu
> Cc: [email protected]
> Subject: Re: [users] Overriding node reboot behavior
> 
> I disabled the node reboot by setting the timeout to 0 as mentioned above.
> I have only one node running and there is no fail over node configured. So
> at the event when there is any persisting error at the node level, one of
> the component in SU detects it, then the reboot escalation cycle goes up
> until the node reboot and is prevented as I have set the timeout to 0. Now
> SU is in a unsuable state refusing any further admin commands. All the
> components in the SU are terminated.
> 
> Is the behavior correct.? Since the error is persistent, the intent was to
> stop the continuous node reboot cycle to allow user to fix the problem and
> restart the SU. But its not accepting any admin commands further.
> 
> *SU state:*
> [root@node1 ~]# amf-state su all
> safSu=node1.SU,safSg=node1.SU,safApp=AmberApp
> safSu=node1.SU,safSg=node1.SU,safApp=AmberApp
>         saAmfSUAdminState=LOCKED(2)
>         saAmfSUOperState=DISABLED(2)
>         saAmfSUPresenceState=UNINSTANTIATED(1)
>         saAmfSUReadinessState=OUT-OF-SERVICE(1)
> 
> 
> 
> */var/log/messages at node1:*
> [root@node1 ~]# tail -f /var/log/messages
> Feb 25 19:14:33 node1 osafamfnd[2232]: IN
> 'safComp=node1.comp1,safSu=node1.SU,safSg=node1.SU,safApp=TestApp'
> Presence
> State TERMINATING => UNINSTANTIATED
> Feb 25 19:14:33 node1 startSAScript: killproc retval 0
> Feb 25 19:14:33 node1 startSAScript: killproc retval 0
> Feb 25 19:14:33 node1 osafamfnd[2232]: IN
> 'safComp=node1.comp2,safSu=node1.SU,safSg=node1.SU,safApp=TestApp'
> Presence
> State TERMINATING => UNINSTANTIATED
> Feb 25 19:14:33 node1 osafamfnd[2232]: IN
> 'safComp=node1.comp3,safSu=node1.SU,safSg=node1.SU,safApp=TestApp'
> Presence
> State TERMINATING => UNINSTANTIATED
> Feb 25 19:14:33 node1 osafamfnd[2232]: NO Terminated all application
> components
> Feb 25 19:14:33 node1 osafamfnd[2232]: NO Informing director of node
> fail-over
> Feb 25 19:14:34 node1 osafamfnd[2232]: NO Received reboot order, ordering
> reboot now!
> Feb 25 19:14:34 node1 osafamfnd[2232]: Rebooting OpenSAF NodeId =
> 131855 EE
> Name = , Reason: Received reboot order, OwnNodeId = 131855,
> SupervisionTime
> = 0
> Feb 25 19:14:34 node1 osafamfnd[2232]: node reboot failure: exit code 512
> 
> */var/log/messages at controller node:*
> Feb 25 20:25:42 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:43 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:44 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:45 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:46 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:47 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:48 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:49 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:50 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:51 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> Feb 25 20:25:52 mgt-a osafamfd[2944]: WA Admin operation is already going
> on (su'safSu=node1.SU,safSg=node1.SU,safApp=TestApp')
> 
> Best regards,
> Santosh
> 
> 
> 
> On Mon, Feb 9, 2015 at 2:06 AM, Mathivanan Naickan Palanivelu <
> [email protected]> wrote:
> 
> > Hi Santosh,
> >
> > Yes, the reboot can be controlled by tuning the OPENSAF_REBOOT_TIMEOUT
> > configuration
> > attribute in /etc/opensaf/nid.conf:
> >
> > Set it to zero to disable reboot, i.e. export OPENSAF_REBOOT_TIMEOUT=0
> >
> > Mathi.
> >
> >
> > ----- [email protected] wrote:
> >
> > > Hi,
> > >
> > > Can we control the node reboot behavior by changing the
> > > opensaf_reboot
> > > script at the node level perform some additional action instead of a
> > > reboot?
> > >
> > > I tried commenting the /sbin/reboot part of the opensaf_reboot script,
> > > but
> > > eventually node rebooted? Does the controller node reboots the payload
> > > node
> > > when the node reboot is declared?
> > >
> > > Any help is much appreciated,
> > >
> > > --
> > > Best Regards,
> > > Santosh
> > >
> > ------------------------------------------------------------------------------
> > > Dive into the World of Parallel Programming. The Go Parallel Website,
> > > sponsored by Intel and developed in partnership with Slashdot Media,
> > > is your
> > > hub for all things parallel software development, from weekly thought
> > > leadership blogs to news, videos, case studies, tutorials and more.
> > > Take a
> > > look and join the conversation now.
> > > http://goparallel.sourceforge.net/
> > > _______________________________________________
> > > Opensaf-users mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> >
> 
> 
> 
> --
> Best Regards,
> Santosh
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to