Hi

Please see inline with [Praveen].

Thanks
Praveen

On 29-Aug-17 2:37 PM, Dheeroj Ram wrote:


Hi All,

I am new to opensaf. Need your help.

Please find my Opensaf Setup as below:

I am using Opensaf 4.4.2 Version and below is my opensaf status output:

atcafs-n10s2:~# /etc/init.d/opensafd status
safSISU=safSu=n10s2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed10,safApp=OpenSAF
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s2\,safSg=HenbGw-SG\,safApp=HenbGwApp,safSi=HenbGw,safApp=HenbGwApp
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
         saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU-n10s1\,safSg=HenbGw-SG\,safApp=HenbGwApp,safSi=HenbGw,safApp=HenbGwApp
         saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SU-n10s5\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s5,safSi=HenbGw,safApp=HenbGwApp_PL_n10s5
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s4\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s4,safSi=HenbGw,safApp=HenbGwApp_PL_n10s4
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
         saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
         saAmfSISUHAState=ACTIVE(1)
atcafs-n10s2:~#

whereas n10s1, n10s2 are my controllers and n10s4,n105 are Payloads.

Below applications are running on Payloads:

atcafs-n10s4:~# ps -aef | grep ins
root      3379     1 21 11:34 ?        00:21:36 /hegw/gsw/bin/hms instantiate
root      3396     1 11 11:34 ?        00:11:49 /hegw/gsw/bin/mms instantiate
root      3410     1  2 11:34 ?        00:02:05 /hegw/gsw/bin/dra instantiate
root      3424     1  2 11:34 ?        00:02:15 /hegw/gsw/bin/bcm instantiate

Problem Detail:

When I killed the application (hms) with signal 11 "kill -11 3379 " , it 
generates a core ( about size 7GB). Opensaf trying to restart the process in 60s , but by 
that time my process was busy with writing the core and till then PID is active.
So opensaf failed with below error:

Aug 29 13:26:12 localhost kernel: grsec: From 172.16.10.1: signal 11 sent to 
/hegw/gsw/bin/hms[hms:11902] uid/euid:0/0 gid/egid:0/0, parent 
/sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 by /bin/bash[bash:10442] 
uid/euid:0/0 gid/egid:0/0, parent /bin/login[login:10441] uid/euid:0/0 
gid/egid:0/0
Aug 29 13:26:27 localhost osafamfnd[11779]: 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
 faulted due to 'healthCheckcallbackTimeout' : Recovery is 'componentRestart'
Aug 29 13:26:27 localhost AMF_DEMO: CMD=cleanup
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR4=COMP1_VALUE4
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR1=CT_VALUE1
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR2=COMP1_OVERLOAD_VALUE2
Aug 29 13:26:27 localhost AMF_DEMO_VAR: AMF_DEMO_VAR3=COMP1_VALUE3
Aug 29 13:26:37 localhost osafamfnd[11779]: Cleanup of 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
 failed
Aug 29 13:26:37 localhost osafamfnd[11779]: Reason:'Script did not exit within 
time'
Aug 29 13:26:37 localhost osafamfnd[11779]: SU Failover trigerred for 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4': Failed component: 
'safComp=HMSComp_n10s4,safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4' Presence State 
INSTANTIATED => TERMINATION_FAILED
Aug 29 13:26:37 localhost osafamfnd[11779]: Assigning 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' QUIESCED to 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Assigned 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' QUIESCED to 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Removing 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' from 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'
Aug 29 13:26:37 localhost osafamfnd[11779]: Removed 
'safSi=HenbGw,safApp=HenbGwApp_PL_n10s4' from 
'safSu=SU-n10s4,safSg=HenbGw-SG,safApp=HenbGwApp_PL_n10s4'


I have given a try by modifying "OPENSAF_TERMTIMEOUT=1000" in nid.conf file.

[Praveen] I guess intention is to increase timeout for clean up script. It can be done by changing saAmfCompCleanupTimeout in component (object of class "SaAmfComp") or by changing saAmfCtDefClcCliTimeout in comptype (class "SaAmfCompType") of component. If changed in comptype, it will be applicable to each component of this comptype provided comp is not overriding it by configuring saAmfCompCleanupTimeout.

Thanks
Praveen
But it didn't work. Issue still exist.

Please let me know if you need any more detail.

Thanks
Dheeraj







============================================================================================================================

Disclaimer:  This message and the information contained herein is proprietary and confidential and subject to the Tech Mahindra policy statement, you 
may review the policy at 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.techmahindra.com_Disclaimer.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCMc3Q&s=9UItU7qiqk64ZLqdyih1OYdmZtMI9zTo8Hma01Z3Xd8&e=
  
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.techmahindra.com_Disclaimer.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCMc3Q&s=9UItU7qiqk64ZLqdyih1OYdmZtMI9zTo8Hma01Z3Xd8&e=
 > externally 
https://urldefense.proofpoint.com/v2/url?u=http-3A__tim.techmahindra.com_tim_disclaimer.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCM
c3Q&s=TcZTnse2wWx2Q7Ph8GvjZtXkxmS8GjXV3hGjInwCdf4&e=  
<https://urldefense.proofpoint.com/v2/url?u=http-3A__tim.techmahindra.com_tim_disclaimer.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCMc3Q&s=TcZTnse2wWx2Q7Ph8GvjZtXkxmS8GjXV3hGjInwCdf4&e=
 > internally within TechMahindra.

============================================================================================================================
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCMc3Q&s=tRwkns6nLNhKWjGqP5uEU6XDMB2OQ4sKW7q4pC63mPY&e=
_______________________________________________
Opensaf-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Dusers&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=3TNJzqT82xNKRrwo5dyTPwQVbyUxT4Ecv0i6BbCMc3Q&s=G4CBrQpbqi1c1svuM8JffBBI_11QodXc1JPKHeoZPxU&e=

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to