[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-10-19 Thread elunlen via Opensaf-tickets
- **status**: review --> fixed
- **assigned_to**: elunlen -->  nobody 
- **Comment**:

commit 1c58a2106a55ad212a8e296424b1f20508eeb9cd
Author: Lennart Lund 
Date:   Thu Oct 19 15:17:27 2017 +0200

smf: coredump and syslog flood after immnd crash [#2441]

When reinitializing the OI handle, done in a separate thread, then keep the
new handle in a local variable until the whole OI including OI set is done
When finished the new handle can be published in the global cb structure.
Also protect global variable change with imm lock mutex




---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Wed Oct 18, 2017 01:41 PM UTC
**Owner:** nobody


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from 

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-10-18 Thread elunlen via Opensaf-tickets
- **status**: accepted --> review



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** review
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Fri Oct 06, 2017 12:20 PM UTC
**Owner:** elunlen


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd 
[27207:../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:2519]
 ER 

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-10-06 Thread elunlen via Opensaf-tickets
- **Priority**: minor --> major



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** accepted
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Mon Sep 11, 2017 09:00 AM UTC
**Owner:** elunlen


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd 
[27207:../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:2519]
 ER 

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-09-11 Thread elunlen via Opensaf-tickets
- **status**: unassigned --> accepted
- **assigned_to**: Rafael Odzakow --> elunlen



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** accepted
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Mon Aug 14, 2017 11:24 AM UTC
**Owner:** elunlen


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd 

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-08-14 Thread Rafael Odzakow via Opensaf-tickets
- **Comment**:

Setting it to minor until it shows up again.



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** unassigned
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Mon Aug 14, 2017 11:23 AM UTC
**Owner:** Rafael Odzakow


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd 

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-08-14 Thread Rafael Odzakow via Opensaf-tickets
- **Priority**: major --> minor



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** unassigned
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Fri Jul 28, 2017 08:25 AM UTC
**Owner:** Rafael Odzakow


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd 
[27207:../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:2519]

[tickets] [opensaf:tickets] #2441 smf: coredump and syslog flood after immnd crash

2017-07-28 Thread Anders Widell via Opensaf-tickets
- **Milestone**: 5.17.07 --> 5.17.10



---

** [tickets:#2441] smf: coredump and syslog flood after immnd crash**

**Status:** unassigned
**Milestone:** 5.17.10
**Created:** Thu Apr 27, 2017 09:05 AM UTC by Rafael Odzakow
**Last Updated:** Thu Jul 27, 2017 02:41 PM UTC
**Owner:** Rafael Odzakow


Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by 
reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND 
crashes and is still reinitializing.


These lines are flooding the system and trace log:
~~~
5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
~~~


SMF backtrace

~~~
### BT FULL ###
#0 0x7f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x7f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x7f04a4d1afee in __osafassert_fail (__file=, 
__line=, __func=, __assertion=) at 
../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x00411f8f in updateImmAttr (dn=, 
attributeName=0x47db5b "saSmfCmpgElapsedTime", 
attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x0040f129 in SmfCampaign::updateElapsedTime 
(this=this@entry=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = 
diffTime = 
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x0040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x00440bee in SmfCampState::changeState 
(this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, 
i_state=0x7f048c00d3d0) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x00443faa in SmfCampStateExecuting::procResult 
(this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=, 
i_result=) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = ,
_M_dataplus = {
 = {
<__gnu_cxx::new_allocator> = {}, },
members of std::basic_string::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = 
#8 0x0042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, 
i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = 
#9 0x0040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x0040cf48 in SmfCampaignThread::handleEvents 
(this=this@entry=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = 
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x00408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) 
at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x00408352 in SmfCampaignThread::main (info=0x1d8d310) at 
../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x7f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x7f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101]
 >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd 
[27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107]
 WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd