[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)

2017-04-05 Thread A V Mahesh (AVM)
 I did verify  performance degrade is not because of  feature [#2202] 
.
 The statistics  WITH #[#2202]  feature  enable and disabled  the %  of degrade 
is ignoreable.

 As if  #[#2202]  is NOT the major root cause  of performance degrade , so for 
now we don't required any /Immediate changes on top of #[#2202]

Irrelevant of this #[#2202] feature  still we do see 70% to 100% performance 
degrade in observed
this could be because of some other changes like `cpnd: use shared memory based 
on ckpt name length [#2108] `
 where the SHM change are related to  support longDN,  currently I am in the 
process of isolating the change which are causing
 the performance degrade, will update as soon as possible.


 -AVM 


---

** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost 
double than previous)**

**Status:** assigned
**Milestone:** 5.2.0
**Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava
**Last Updated:** Thu Apr 06, 2017 03:11 AM UTC
**Owner:** A V Mahesh (AVM)


Environment details

OS : Suse 11, 64bit Physical machine 
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes

There is considerable degradation in CKPT performance in 5.2 when compared to 
5.1. The times are calculated just before api and after api for which time 
difference is calculated.

-> For write operations, checkpoint write api is taking 2x the time taken in 
earlier release 5.1. Issue is observed in both synchronous and asynchronous 
mode.
( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS
asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | 
SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica

-> For section create operations in asynchronous mode for local replica, 
checkpoint section create api is taking more than 70% the earlier value in 5.1

-> For read operations in asynchronous mode for local replica, checkpoint read 
api is taking twice the time than in earlier value in 5.1

Please check the tickets pushed as part of 4.7 to 5.0, for which API 
performance got affected.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)

2017-04-05 Thread A V Mahesh (AVM)
I did verify  performance degrade is not because of  feature  #2395.
The statistics  WITH #2395 feature  enable and disabled  the %  of degrade is 
ignoreable.

Irrelevant of this #2395 feature  still we do see 70% to 100% performance 
degrade in observed
this could be because of some other changes like `cpnd: use shared memory based 
on ckpt name length [#2108]` where the SHM change are related to  support 
longDN,  currently I am in the process of isolating the change which are 
causing the performance degrade, will update as soon as possible.


---

** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost 
double than previous)**

**Status:** assigned
**Milestone:** 5.2.0
**Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava
**Last Updated:** Wed Apr 05, 2017 02:53 PM UTC
**Owner:** A V Mahesh (AVM)


Environment details

OS : Suse 11, 64bit Physical machine 
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes

There is considerable degradation in CKPT performance in 5.2 when compared to 
5.1. The times are calculated just before api and after api for which time 
difference is calculated.

-> For write operations, checkpoint write api is taking 2x the time taken in 
earlier release 5.1. Issue is observed in both synchronous and asynchronous 
mode.
( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS
asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | 
SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica

-> For section create operations in asynchronous mode for local replica, 
checkpoint section create api is taking more than 70% the earlier value in 5.1

-> For read operations in asynchronous mode for local replica, checkpoint read 
api is taking twice the time than in earlier value in 5.1

Please check the tickets pushed as part of 4.7 to 5.0, for which API 
performance got affected.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)

2017-04-05 Thread Anders Widell
Could you measure the performance of CKPT after applying the attached patch? 
Also, make sure to set OSAF_CKPT_SHM_ALLOC_GUARANTEE=2.


Attachments:

- 
[ckpt_performance.diff.gz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/9bf0cece/6405/attachment/ckpt_performance.diff.gz)
 (1.6 kB; application/gzip)


---

** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost 
double than previous)**

**Status:** assigned
**Milestone:** 5.2.0
**Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava
**Last Updated:** Wed Apr 05, 2017 09:56 AM UTC
**Owner:** A V Mahesh (AVM)


Environment details

OS : Suse 11, 64bit Physical machine 
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes

There is considerable degradation in CKPT performance in 5.2 when compared to 
5.1. The times are calculated just before api and after api for which time 
difference is calculated.

-> For write operations, checkpoint write api is taking 2x the time taken in 
earlier release 5.1. Issue is observed in both synchronous and asynchronous 
mode.
( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS
asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | 
SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica

-> For section create operations in asynchronous mode for local replica, 
checkpoint section create api is taking more than 70% the earlier value in 5.1

-> For read operations in asynchronous mode for local replica, checkpoint read 
api is taking twice the time than in earlier value in 5.1

Please check the tickets pushed as part of 4.7 to 5.0, for which API 
performance got affected.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2412 log: refactor handling log client database in log agent

2017-04-05 Thread Vu Minh Nguyen
- Description has changed:

Diff:



--- old
+++ new
@@ -5,3 +5,7 @@
 So, this ticket intends to remove that concern by doing:
 1) Centralizing read/write accesses to the database to one place with its 
private mutex
 2) Use C++ containters to contain and handle databases
+
+And will push the ticket in 02 increments:
+1) Convert agent code to C++ without touching any existing logic (looks like 
what AMF has done it in [#1673])
+2) Do #1 and #2 above






---

** [tickets:#2412] log: refactor handling log client database in log agent**

**Status:** accepted
**Milestone:** future
**Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong
**Last Updated:** Wed Apr 05, 2017 01:36 PM UTC
**Owner:** Vu Minh Nguyen


In log agent, there is a link list holding all log clients of an application 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.

Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.

So, this ticket intends to remove that concern by doing:
1) Centralizing read/write accesses to the database to one place with its 
private mutex
2) Use C++ containters to contain and handle databases

And will push the ticket in 02 increments:
1) Convert agent code to C++ without touching any existing logic (looks like 
what AMF has done it in [#1673])
2) Do #1 and #2 above


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2412 log: refactor handling log client database in log agent

2017-04-05 Thread Vu Minh Nguyen
- Description has changed:

Diff:



--- old
+++ new
@@ -1,4 +1,4 @@
-In log agent, there is a link list holding all log clients of an pplication 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.
+In log agent, there is a link list holding all log clients of an application 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.
 
 Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.
 






---

** [tickets:#2412] log: refactor handling log client database in log agent**

**Status:** accepted
**Milestone:** future
**Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong
**Last Updated:** Wed Apr 05, 2017 01:33 PM UTC
**Owner:** Vu Minh Nguyen


In log agent, there is a link list holding all log clients of an application 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.

Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.

So, this ticket intends to remove that concern by doing:
1) Centralizing read/write accesses to the database to one place with its 
private mutex
2) Use C++ containters to contain and handle databases


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2412 log: refactor handling log client database in log agent

2017-04-05 Thread Vu Minh Nguyen
- **summary**: log: coredump while finalize client in parallel --> log: 
refactor handling log client database in log agent
- Description has changed:

Diff:



--- old
+++ new
@@ -1,4 +1,7 @@
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0 0x7f5bb0159d68 in lga_hdl_rec_del 
(list_head=list_head@entry=0x7f5bb0360288 , 
rm_node=rm_node@entry=0x7f5b740029e0) at src/log/agent/lga_util.c:651
+In log agent, there is a link list holding all log clients of an pplication 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.
 
-When 2 threads do the same action finalize client, log agent remove the client 
in the list. in one thread, it access to the node in the list and this node 
maybe deleted by another thread. it causes the coredump happen.
+Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.
+
+So, this ticket intends to remove that concern by doing:
+1) Centralizing read/write accesses to the database to one place with its 
private mutex
+2) Use C++ containters to contain and handle databases



- **assigned_to**: Canh Truong --> Vu Minh Nguyen
- **Type**: defect --> enhancement
- **Milestone**: 5.2.0 --> future



---

** [tickets:#2412] log: refactor handling log client database in log agent**

**Status:** accepted
**Milestone:** future
**Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong
**Last Updated:** Wed Apr 05, 2017 07:21 AM UTC
**Owner:** Vu Minh Nguyen


In log agent, there is a link list holding all log clients of an pplication 
process. Also, in each log client, there is an additional link list holding all 
log streams which belongs to each log client.

Adding, modifying or deleing the link lists' elements or on sub-items of the 
client dabases are distrubuted in a lot of places, this could easily cause 
troubles regarding race condition, deadlock, or risks when adding code that do 
changes the databases.

So, this ticket intends to remove that concern by doing:
1) Centralizing read/write accesses to the database to one place with its 
private mutex
2) Use C++ containters to contain and handle databases


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2413 smf: coredump, suspend is issued at completed state

2017-04-05 Thread Rafael



---

** [tickets:#2413] smf: coredump, suspend is issued at completed state**

**Status:** unassigned
**Milestone:** 5.2.0
**Created:** Wed Apr 05, 2017 12:39 PM UTC by Rafael
**Last Updated:** Wed Apr 05, 2017 12:39 PM UTC
**Owner:** nobody
**Attachments:**

- 
[osafsmfd.9276.SC-2.core.txt](https://sourceforge.net/p/opensaf/tickets/2413/attachment/osafsmfd.9276.SC-2.core.txt)
 (15.4 kB; text/plain)


ticket #2145 looks to be causing this issue. 

coredump printout is attached.

Steps to reproduce: run a campaign and have AMF compenent fail at the campaign 
completed state. This triggers a event in SMF which tries to suspend a 
completed campaign.

Function handleAmfObjectStateChangeNotification will try to call asyncFailure() 
which is the same as suspend() because the campaign is completed and commited 
this is not a valid transition. The campaign state instance is most likely 
deleted therefore we get a coredump.

For reference refer to figures 5, 6, 7 in SMF AIS. Starting from section 5.1.3


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2394 clm: add clm tool commands for admin op and state check.

2017-04-05 Thread Praveen
- **status**: accepted --> review



---

** [tickets:#2394] clm: add clm tool commands for admin op and state check.**

**Status:** review
**Milestone:** next
**Created:** Thu Mar 23, 2017 06:17 AM UTC by Praveen
**Last Updated:** Thu Mar 23, 2017 06:17 AM UTC
**Owner:** Praveen


Intention is to add clm tool comamnds:
-to perform admin operation on node or on cluster. Something like 
clm-adm  

-to check CLM nodes admin state and member ship status: like
clm-state  
-to find CLM cluster and nodes like:
clm-find  


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2395 CKPT: Performance degradation ~100% (Time taken is almost double than previous)

2017-04-05 Thread Chani Srivastava
- **summary**: CKPT: Performance degradation upto 200% --> CKPT: Performance 
degradation ~100% (Time taken is almost double than previous)



---

** [tickets:#2395] CKPT: Performance degradation ~100% (Time taken is almost 
double than previous)**

**Status:** assigned
**Milestone:** 5.2.0
**Created:** Thu Mar 23, 2017 10:26 AM UTC by Chani Srivastava
**Last Updated:** Wed Apr 05, 2017 06:13 AM UTC
**Owner:** A V Mahesh (AVM)


Environment details

OS : Suse 11, 64bit Physical machine 
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes

There is considerable degradation in CKPT performance in 5.2 when compared to 
5.1. The times are calculated just before api and after api for which time 
difference is calculated.

-> For write operations, checkpoint write api is taking 2x the time taken in 
earlier release 5.1. Issue is observed in both synchronous and asynchronous 
mode.
( synchronous -- Checkpoint create flags used : SA_CKPT_WR_ALL_REPLICAS
asynchronous -- Checkpoint create flag used : SA_CKPT_WR_ACTIVE_REPLICA | 
SA_CKPT_CHECKPOINT_COLLOCATED ) Both local and remote replica

-> For section create operations in asynchronous mode for local replica, 
checkpoint section create api is taking more than 70% the earlier value in 5.1

-> For read operations in asynchronous mode for local replica, checkpoint read 
api is taking twice the time than in earlier value in 5.1

Please check the tickets pushed as part of 4.7 to 5.0, for which API 
performance got affected.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2404 Amf : amfd crashed on active controller when executing the campaign for application upgrade.

2017-04-05 Thread Nagendra Kumar
- **status**: review --> fixed
- **Comment**:

changeset:   8751:e036951b5168
tag: tip
parent:  8749:cc3ae4601faf
user:Nagendra Kumar
date:Wed Apr 05 13:46:58 2017 +0530
summary: amfd: correct loop variable initialization [#2404]

[staging:e03695]




---

** [tickets:#2404] Amf : amfd crashed on active controller when executing the 
campaign for application upgrade.**

**Status:** fixed
**Milestone:** 5.0.2
**Created:** Thu Mar 30, 2017 09:49 AM UTC by Madhurika Koppula
**Last Updated:** Wed Apr 05, 2017 06:53 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[amfd_crash.tgz](https://sourceforge.net/p/opensaf/tickets/2404/attachment/amfd_crash.tgz)
 (739.4 kB; application/octet-stream)


**Environment Details:**

OS : Suse 64bit
GCC Version: 6.1
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
PBE disabled ).
Changeset : 8701 ( 5.2.RC1) 

**Summary**:  amfd crashed on active controller when executing the campaign 
modeled for testing SG upgrade of no redundancy model. 

**Steps followed & Observed behaviour:**

1) Brought up the four nodes cluster successfully.
2) Brought up the No Redundancy model.
3) When executed campaign for testing SG upgrade, observed amfd crash on active 
controller at the moment when SU hosted on PL-3 went to instantaiation failed 
state after upgrade  (as script exits with non-zero status).
4) Amfd got aborted when invoking saImmOiDispatch.

**Below is the timestamp on active controller SC-1:**

Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO adminOperation: 
immUtil.callAdminOperation() Fail SA_AIS_ERR_REPAIR_PENDING (29), Failed unit 
is 'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp'
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: ER Failed to Restart activation 
units
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: ER Step execution failed, Try 
undoing the step
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO SmfStepStateUndoing::execute 
start undoing step.
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO STEP: Rolling back AU restart 
step 
safSmfStep=0001,safSmfProc=amfClusterProc-1,safSmfCampaign=Campaign_1,safApp=safSmfService
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO STEP: Online installation of 
old software
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: WA SU: 
safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp failed after upgrade in 
campaign
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO STEP: Create old 
SaAmfNodeSwBundle objects
Apr 27 14:36:08 SLES-M-SLOT-1 osafimmnd[4115]: NO Ccb 56 COMMITTED (SMFSERVICE)
Apr 27 14:36:08 SLES-M-SLOT-1 osafsmfd[4199]: NO STEP: Reverse information 
model and set maintenance status for deactivation units
Apr 27 14:36:08 SLES-M-SLOT-1 osafimmnd[4115]: NO Ccb 57 COMMITTED (SMFSERVICE)

**Apr 27 14:36:08 SLES-M-SLOT-1 osafamfnd[4181]: ER AMFD has unexpectedly 
crashed. Rebooting node

Apr 27 14:36:08 SLES-M-SLOT-1 osafamfnd[4181]: Rebooting OpenSAF NodeId = 
131343 EE **
Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60


**Timestamp on PL-3:**

Apr  4 18:32:55 SLES-M-SLOT-3 osafamfnd[26567]: NO saAmfCompType changed to 
'safVersion=5.0.0,safCompType=Comp_PxyApp_Proxy1_1_1' for 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp'
Apr  4 18:32:55 SLES-M-SLOT-3 osafimmnd[26556]: NO Ccb 54 COMMITTED (SMFSERVICE)
Apr  4 18:32:55 SLES-M-SLOT-3 osafamfnd[26567]: NO Admin restart requested for 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp'
Apr  4 18:32:56 SLES-M-SLOT-3 osafamfnd[26567]: NO 
saAmfCtDefQuiescingCompleteTimeout for 
'safVersion=5.0.0,safCompType=Comp_PxyApp_Proxy1_1_1' initialized with 
saAmfCtDefCallbackTimeout
Apr  4 18:32:56 SLES-M-SLOT-3 osafamfnd[26567]: NO Instantiation of 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp' failed
Apr  4 18:32:56 SLES-M-SLOT-3 osafamfnd[26567]: NO Reason:'Exec of script 
success, but script exits with non-zero status'
Apr  4 18:32:56 SLES-M-SLOT-3 osafamfnd[26567]: NO Exit code: 1
Apr  4 18:32:59 SLES-M-SLOT-3 osafamfnd[26567]: NO Instantiation of 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp' failed
Apr  4 18:32:59 SLES-M-SLOT-3 osafamfnd[26567]: NO Reason:'Exec of script 
success, but script exits with non-zero status'
Apr  4 18:32:59 SLES-M-SLOT-3 osafamfnd[26567]: NO Exit code: 1
**Apr  4 18:33:02 SLES-M-SLOT-3 osafamfnd[26567]: NO Instantiation of 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp' failed
Apr  4 18:33:02 SLES-M-SLOT-3 osafamfnd[26567]: NO Reason:'Exec of script 
success, but script exits with non-zero status'**
Apr  4 18:33:02 SLES-M-SLOT-3 osafamfnd[26567]: NO Exit code: 1
Apr  4 18:33:05 SLES-M-SLOT-3 osafamfnd[26567]: WA 
'safComp=Proxy1,safSu=dummy_Proxy_1,safSg=SG_dummy_Proxy,safApp=PxyApp' 
Presence State RESTARTING => INSTANTIATION_FAILED
Apr  4 18:33:05 SLES-M-SLOT-3 osafamfnd[26567]: NO Com

[tickets] [opensaf:tickets] #1889 Immnd crashed on Payload during headless operation

2017-04-05 Thread Neelakanta Reddy
- **status**: needinfo --> wontfix
- **Comment**:

Reopen, the defect with sufficient logs



---

** [tickets:#1889] Immnd crashed on Payload during headless operation**

**Status:** wontfix
**Milestone:** future
**Created:** Tue Jun 21, 2016 09:50 AM UTC by Ritu Raj
**Last Updated:** Tue Nov 08, 2016 07:11 AM UTC
**Owner:** nobody
**Attachments:**

- 
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1889/attachment/SC-1.tar.bz2)
 (7.6 MB; application/x-bzip)
- 
[SCALE_SLOT-75.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1889/attachment/SCALE_SLOT-75.tar.bz2)
 (4.8 MB; application/x-bzip)


setup:
Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5&PL-6: Payloads)

* Issue Observed:
Immnd crashed on Payload during headless operation

* Steps performed: 
(1). Invoke headless 
(2). Created logsv application stream after headless
(3). Closed the stream after performing write operation
(4). While reverting back to default configuration one of the CCB operation 
failed

>> SCALE_SLOT-75 osafimmnd[18906]: WA ERR_FAILED_OPERATION: ccb 1 is not in an 
>> expected state: 11 rejecting ccbObjectModify operation
>>
immcfg -a saLogStreamLogFullAction=3 
safLgStrCfg=saLogNotification,safApp=safLogService
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Resource abort: CCB is not in an expected state
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)

(5). On invoking second headless immnd crashed on both the payload

>>
Jun 21 14:27:53 SCALE_SLOT-75 osafimmnd[18906]: ImmModel.cc:648: 
immModel_abortNonCriticalCcbs: **Assertion 'immModel_ccbAbort(cb, (*i3)->mId, 
&arrSize, &implConnArr, &clientArr, &clientArrSize, &nodeId, &pbeNodeId)' 
failed**.
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO 
'safSu=PL-5,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO Restarting a component of 
'safSu=PL-5,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Jun 21 14:27:53 SCALE_SLOT-75 osafamfnd[18925]: NO 
'**safComp=IMMND,safSu=PL-5,safSg=NoRed,safApp=OpenSAF' faulted** due to 
'avaDown' : Recovery is 'componentRestart'
Jun 21 14:27:53 SCALE_SLOT-75 osafimmnd[19167]: Started


* Syslog and Immnd trace file is attached


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1868 Headless: IMM: Cluster reset happened due to 'avaDown' while killing immd

2017-04-05 Thread Neelakanta Reddy
- **status**: unassigned --> wontfix
- **Comment**:

The problem and the logs share are not matching, share the correct logs.



---

** [tickets:#1868] Headless: IMM: Cluster reset happened due to 'avaDown' while 
killing immd**

**Status:** wontfix
**Milestone:** future
**Created:** Wed Jun 08, 2016 12:45 PM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 09:04 AM UTC
**Owner:** nobody
**Attachments:**

- 
[syslog_PL5](https://sourceforge.net/p/opensaf/tickets/1868/attachment/syslog_PL5)
 (153.5 kB; application/octet-stream)
- 
[syslog_SC1](https://sourceforge.net/p/opensaf/tickets/1868/attachment/syslog_SC1)
 (173.4 kB; application/octet-stream)
- 
[syslog_SC2](https://sourceforge.net/p/opensaf/tickets/1868/attachment/syslog_SC2)
 (147.6 kB; application/octet-stream)
- 
[syslog_SC3](https://sourceforge.net/p/opensaf/tickets/1868/attachment/syslog_SC3)
 (124.9 kB; application/octet-stream)


setup:
Version - opensaf 5.0.GA
6-Node cluster(SC-1:Active, SC-2:Standby, SC-3:Spare PL:4,PL-5&PL-6: Payloads)

Step to reproduce:
1. Install and bring up opensaf on 6 nodes in cluster with with Active, 
Stanbdy, Spare and 3 Payloads
2. Take cluster in headless state by killing immd onActive Controller first 
followed by Standby and Spare controller.
3. IMMD got crashed due to avaDown andf cluster reset happened.

> Jun  8 15:35:53 SCALE_SLOT-81 osafimmnd[1806]: NO SERVER STATE: 
> IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
Jun  8 15:35:53 SCALE_SLOT-81 osafimmd[1756]: NO ACT: New Epoch for IMMND 
process at node 2060f old epoch: 0  new epoch:104
Jun  8 15:35:54 SCALE_SLOT-81 osafamfd[1852]: NO Received node_up from 2060f: 
msg_id 1
Jun  8 15:35:54 SCALE_SLOT-81 osafamfd[1852]: NO Node 'PL-6' joined the cluster
Jun  8 15:35:56 SCALE_SLOT-81 osafimmnd[1806]: NO Implementer connected: 748 
(MsgQueueService132623) <0, 2060f>
Jun  8 15:43:50 SCALE_SLOT-81 osafimmnd[1806]: NO ERR_BAD_OPERATION: parent 
object not owned by 'SetUp_Ccb'
Jun  8 15:43:50 SCALE_SLOT-81 osafimmnd[1806]: NO ERR_BAD_OPERATION: parent 
object not owned by 'SetUp_Ccb'
Jun  8 15:43:52 SCALE_SLOT-81 osafimmnd[1806]: NO Implementer connected: 749 
(RUNTIMEIMPL) <0, 2050f>
Jun  8 15:44:06 SCALE_SLOT-81 sshd[3213]: Accepted keyboard-interactive/pam for 
root from 192.2.8.94 port 37187 ssh2
Jun  8 15:44:07 SCALE_SLOT-81 root: killing osafimmd from run_headless.sh on 
spare controller
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: **ER 
safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Jun  8 15:44:07 SCALE_SLOT-81 osafamfnd[1863]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60**
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA DISCARD DUPLICATE FEVS 
message:67683
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA Error code 2 returned for 
message type 82 - ignoring
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA DISCARD DUPLICATE FEVS 
message:67684
Jun  8 15:44:07 SCALE_SLOT-81 osafimmnd[1806]: WA Error code 2 returned for 
message type 82 - ignoring
Jun  8 15:44:07 SCALE_SLOT-81 opensaf_reboot: Rebooting local node; timeout=60

> 
Attaching syslogs for controllers and payload in action
Traces are huge in size. Will share seperately

Note: Machines are not sync with timings. Current logs are the ones after June 8


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2343 IMM: immnd failed to spawn while starting opensaf on controller

2017-04-05 Thread Neelakanta Reddy
- **status**: assigned --> wontfix
- **Comment**:

check if there is any link loss in the cluster.
re-open the ticket if the problem is not related to link loss and provide the 
requested information.



---

** [tickets:#2343] IMM: immnd failed to spawn while starting opensaf on 
controller**

**Status:** wontfix
**Milestone:** future
**Created:** Fri Mar 03, 2017 11:46 AM UTC by Chani Srivastava
**Last Updated:** Mon Mar 13, 2017 10:06 AM UTC
**Owner:** Neelakanta Reddy


**Environment details**

OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )

Summary

immnd failed to spawn a number of times while starting openSaf on controller.
This issue is observed in various situations

1. While resetting cluster and starting OpenSaf again
2. While invoking continuous failovers.
3. While stoping and starting openSaf on standby controller.



Mar  3 15:45:49 OSAF-SC1 opensafd: Starting OpenSAF Services(5.2.FC - ) (Using 
TIPC)
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.828240] TIPC: Activated (version 2.0.0)
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.828391] NET: Registered protocol family 
30
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.828393] TIPC: Started in single node 
mode
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.834836] TIPC: Started in network mode
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.834839] TIPC: Own node address <1.1.1>, 
network identity 4141
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.838982] TIPC: Enabled bearer 
, discovery domain <1.1.0>, priority 10
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.840611] TIPC: Established link 
<1.1.1:eth1-1.1.2:eth1> on network plane A
Mar  3 15:45:49 OSAF-SC1 kernel: [   43.840688] TIPC: Established link 
<1.1.1:eth1-1.1.3:eth1> on network plane A
Mar  3 15:45:49 OSAF-SC1 osaftransportd[3854]: mkfifo already exists: 
/var/lib/opensaf/osaftransportd.fifo File exists
Mar  3 15:45:49 OSAF-SC1 osaftransportd[3854]: Started
Mar  3 15:45:49 OSAF-SC1 opensafd[3830]: NO Monitoring of TRANSPORT started
Mar  3 15:45:50 OSAF-SC1 osafclmna[3861]: mkfifo already exists: 
/var/lib/opensaf/osafclmna.fifo File exists
Mar  3 15:45:50 OSAF-SC1 osafclmna[3861]: Started
Mar  3 15:45:50 OSAF-SC1 opensafd[3830]: NO Monitoring of CLMNA started
Mar  3 15:45:50 OSAF-SC1 osafclmna[3861]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Mar  3 15:45:50 OSAF-SC1 osafrded[3870]: mkfifo already exists: 
/var/lib/opensaf/osafrded.fifo File exists
Mar  3 15:45:50 OSAF-SC1 osafrded[3870]: Started
Mar  3 15:45:50 OSAF-SC1 osaffmd[3879]: mkfifo already exists: 
/var/lib/opensaf/osaffmd.fifo File exists
Mar  3 15:45:50 OSAF-SC1 osaffmd[3879]: Started
Mar  3 15:45:50 OSAF-SC1 osaffmd[3879]: NO Remote fencing is disabled
Mar  3 15:45:50 OSAF-SC1 opensafd[3830]: NO Monitoring of HLFM started
Mar  3 15:45:50 OSAF-SC1 osafimmd[3889]: mkfifo already exists: 
/var/lib/opensaf/osafimmd.fifo File exists
Mar  3 15:45:50 OSAF-SC1 osafimmd[3889]: Started
Mar  3 15:45:50 OSAF-SC1 opensafd[3830]: NO Monitoring of IMMD started
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: Started
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Mar  3 15:45:50 OSAF-SC1 osafimmnd[3900]: NO NODE STATE-> IMM_NODE_ISOLATED
Mar  3 15:45:51 OSAF-SC1 osafimmnd[3900]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Mar  3 15:45:51 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Mar  3 15:51:01 OSAF-SC1 osafimmnd[3900]: WA Global ABORT SYNC received for 
epoch 508
Mar  3 15:51:01 OSAF-SC1 osafimmnd[3900]: WA SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_LOADING_PENDING (sync aborted)
Mar  3 15:51:01 OSAF-SC1 osafimmnd[3900]: NO NODE STATE-> IMM_NODE_UNKNOW 2827
Mar  3 15:51:01 OSAF-SC1 osafimmnd[3900]: NO Abort sync: Discarding synced 
objects
Mar  3 15:51:04 OSAF-SC1 osafimmnd[3900]: NO Abort sync: Discarding synced 
classes
Mar  3 15:51:04 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Mar  3 15:51:05 OSAF-SC1 osafimmnd[3900]: NO NODE STATE-> IMM_NODE_ISOLATED
Mar  3 15:51:06 OSAF-SC1 osafimmnd[3900]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Mar  3 15:51:06 OSAF-SC1 osafimmnd[3900]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Mar  3 15:51:41 OSAF-SC1 osafimmnd[3900]: NO Implementer connected: 1223 
(RUNTIMEIMPL) <0, 2030f>
Mar  3 15:53:50 OSA

[tickets] [opensaf:tickets] #2388 imm: active node rebooted due immd assertion failure

2017-04-05 Thread Neelakanta Reddy
- **status**: assigned --> invalid
- **Comment**:

4.7 is not supported from OpenSAF perspective.
closing this defect as invalid



---

** [tickets:#2388] imm: active node rebooted due immd assertion failure**

**Status:** invalid
**Milestone:** 5.2.0
**Created:** Tue Mar 21, 2017 07:18 AM UTC by M Chandrasekhar
**Last Updated:** Fri Mar 24, 2017 11:38 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- 
[logs.tar](https://sourceforge.net/p/opensaf/tickets/2388/attachment/logs.tar) 
(38.0 MB; application/octet-stream)


###Environment details

OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )
SC-1 and PL-3 installed with 4.7GA
SC-2 and PL-4 installed with 5.2RC1

###Summary
Active controller got rebooted due to immd got assertion failure after few 
immnd restarts.

steps followed:
1. bring up SC-1 and PL-3 with 4.7GA version
2. bring up SC-2 and PL-4 with 5.2RC version 
3. do si-swap, and make SC-2 active
3. run few regression tests and immnd restarts and issue was noticed.


Mar 20 23:38:02 fos2 osafimmnd[27544]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
2927
Mar 20 23:38:02 fos2 osafimmd[17384]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 29  new epoch:30
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO RepositoryInitModeT is 
SA_IMM_KEEP_REPOSITORY
Mar 20 23:38:02 fos2 osafimmnd[27544]: WA IMM Access Control mode is DISABLED!
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Epoch set to 30 in ImmModel
Mar 20 23:38:02 fos2 test_immsv: IN Received PROC_STALE_CLIENTS
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT 
--> IMM_SERVER_READY
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO ImmModel received scAbsenceAllowed 0
Mar 20 23:38:02 fos2 osafimmd[17384]: NO ACT: New Epoch for IMMND process at 
node 2030f old epoch: 29  new epoch:30
Mar 20 23:38:02 fos2 osafimmd[17384]: NO ACT: New Epoch for IMMND process at 
node 2040f old epoch: 29  new epoch:30
Mar 20 23:38:02 fos2 osafimmd[17384]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:30
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 944 
(safSmfService) <315, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 945 
(safEvtService) <123, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 946 
(safLogService) <127, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 947 
(safCheckPointService) <134, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 948 
(safClmService) <131, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 949 
(safLckService) <135, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 950 
(MsgQueueService131599) <12777, 2020f>
Mar 20 23:38:02 fos2 osafimmnd[27544]: NO Implementer connected: 951 
(safAmfService) <129, 2020f>
Mar 20 23:38:03 fos2 osafimmnd[27544]: NO Implementer (applier) connected: 952 
(@OpenSafImmReplicatorB) <13770, 2020f>
Mar 20 23:38:03 fos2 osafntfimcnd[27526]: NO Started
Mar 20 23:38:03 fos2 osafimmnd[27544]: NO PBE-OI established on other SC. 
Dumping incrementally to file imm.db
Mar 20 23:38:08 fos2 sudo:  tet : TTY=unknown ; PWD=/tmp/26815aa ; 
USER=root ; COMMAND=/bin/kill -9 27544
Mar 20 23:38:08 fos2 osafimmd[17384]: NO MDS event from svc_id 25 (change:4, 
dest:565217221926950)
Mar 20 23:38:08 fos2 osafamfnd[17445]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 10)
Mar 20 23:38:08 fos2 osafamfnd[17445]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Mar 20 23:38:08 fos2 osafntfimcnd[27526]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Mar 20 23:38:08 fos2 osafimmnd[27586]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
Mar 20 23:38:08 fos2 osafimmnd[27586]: Started
Mar 20 23:38:08 fos2 osafimmnd[27586]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Mar 20 23:38:08 fos2 osafimmd[17384]: NO MDS event from svc_id 25 (change:3, 
dest:565217221935144)
Mar 20 23:38:08 fos2 osafimmnd[27586]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Mar 20 23:38:08 fos2 osafimmnd[27586]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
Mar 20 23:38:08 fos2 osafimmnd[27586]: NO Fevs count adjusted to 64649 
preLoadPid: 0
Mar 20 23:38:08 fos2 osafimmnd[27586]: src/imm/immnd/immnd_evt.c:9125: 
immnd_evt_proc_fevs_rcv: Assertion '!reply_dest || (reply_dest == 
cb->immnd_mdest_id) || isObjSync' failed.
Mar 20 23:38:08 fos2 osafimmd[17384]: NO MDS event from svc_id 25 (change:4, 
dest:565217221935144)
Mar 20 23:38:08 fos2 osafamfnd[17445]: NO Restarting a component of 
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 11)
Mar 20 23:38:08 fos2 osafamfnd[17445]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
M

[tickets] [opensaf:tickets] #2401 imm: Check for response when using MDS SNDRSP

2017-04-05 Thread Neelakanta Reddy
- **Milestone**: 5.2.0 --> 5.0.2
- **Comment**:

changing the milestone to 5.0.2 as patch is published for all braches.



---

** [tickets:#2401] imm: Check for response when using MDS SNDRSP**

**Status:** review
**Milestone:** 5.0.2
**Created:** Wed Mar 29, 2017 09:02 AM UTC by Hung Nguyen
**Last Updated:** Mon Apr 03, 2017 10:01 AM UTC
**Owner:** Hung Nguyen


Sometimes, ncsmds_api() returned NCSCC_RC_SUCCESS even when 
NCSMDS_INFO.info.svc_send.info.sndrsp.o_rsp is NULL.

The library may crash when that happens

~~~
[New LWP 478]
[New LWP 480]
[New LWP 481]
[New LWP 482]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/lib/opensaf/osafamfd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106

Thread 1 (Thread 0x7f00cb1b5780 (LWP 478)):
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
No locals.
#1  0x7f00ca2e8ef1 in osaf_extended_name_lend (value=0x0, 
name=0x7ffc65188f50) at src/base/osaf_extended_name.c:82
length = 
#2  0x7f00c909a166 in saImmOmSearchNext_2 
(searchHandle=searchHandle@entry=1490679334504883525, 
objectName=objectName@entry=0x7ffc65188f50, 
attributes=attributes@entry=0x7ffc65188ea0) at src/imm/agent/imma_om_api.cc:7580
objName = 0x0
rc = 
#3  0x7f00cab8a7dc in immutil_saImmOmSearchNext_2 
(searchHandle=1490679334504883525, objectName=0x7ffc65188f50, 
attributes=0x7ffc65188ea0) at src/osaf/immutil/immutil.c:1817
rc = 
nTries = 
#4  0x5619eccab268 in avd_su_config_get 
(sg_name="safSg=AmfDemo,safApp=AmfDemo2", sg=sg@entry=0x5619ed8e5b40) at 
src/amf/amfd/su.cc:704
searchHandle = 1490679334504883525
su_name = "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo2"
className = 0x5619eccc1a33 "SaAmfSU"
su = 
configAttributes = {0x5619ecccebde "saAmfSUType", 0x5619eccced2c 
"saAmfSURank", 0x5619eccc1913 "saAmfSUHostedByNode", 0x5619ecccebfd 
"saAmfSUHostNodeOrNodeGroup", 0x5619ecccec29 "saAmfSUFailover", 0x5619eccced11 
"saAmfSUMaintenanceCampaign", 0x5619eccbb477 "saAmfSUAdminState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc65188ea8}}
__FUNCTION__ = "avd_su_config_get"
error = SA_AIS_OK
rc = 
tmp_su_name = {_opaque = {0 }}
attributes = 0x5619ed8e5c70
#5  0x5619ecc61711 in avd_sg_config_get (app_dn="safApp=AmfDemo2", 
app=app@entry=0x5619ed8abc40) at src/amf/amfd/sg.cc:470
searchHandle = 1490679334503167364
dn = {_opaque = {29, 24947, 21350, 15719, 27969, 17510, 28005, 11375, 
24947, 16742, 28784, 16701, 26221, 25924, 28525, 50, 0 }}
className = 0x5619eccc1a23 "SaAmfSG"
configAttributes = {0x5619eccc84e6 "saAmfSGType", 0x5619eccc8516 
"saAmfSGSuHostNodeGroup", 0x5619eccc84f2 "saAmfSGAutoRepair", 0x5619eccc8504 
"saAmfSGAutoAdjust", 0x5619eccc857c "saAmfSGNumPrefActiveSUs", 0x5619eccc8594 
"saAmfSGNumPrefStandbySUs", 0x5619eccc85ad "saAmfSGNumPrefInserviceSUs", 
0x5619eccc85c8 "saAmfSGNumPrefAssignedSUs", 0x5619eccc85e2 
"saAmfSGMaxActiveSIsperSU", 0x5619eccc85fb "saAmfSGMaxStandbySIsperSU", 
0x5619eccc8615 "saAmfSGAutoAdjustProb", 0x5619eccc862b 
"saAmfSGCompRestartProb", 0x5619eccc8642 "saAmfSGCompRestartMax", 
0x5619eccc8658 "saAmfSGSuRestartProb", 0x5619eccc866d "saAmfSGSuRestartMax", 
0x5619eccc8313 "saAmfSGAdminState", 0x5619eccc833e "osafAmfSGFsmState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
sg = 0x5619ed8e5b40
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc65189108}}
__FUNCTION__ = "avd_sg_config_get"
error = SA_AIS_OK
rc = 
attributes = 0x5619ed8e4370
#6  0x5619ecbf8981 in avd_app_config_get () at src/amf/amfd/app.cc:460
searchHandle = 1490679334315192083
dn = {_opaque = {15, 24947, 16742, 28784, 16701, 26221, 25924, 28525, 
50, 0 }}
className = 0x5619eccb9938 "SaAmfApplication"
configAttributes = {0x5619eccb987f "saAmfAppType", 0x5619eccb98cd 
"saAmfApplicationAdminState", 0x0}
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
searchParam = {searchOneAttr = {attrName = 0x5619eccb998c 
"SaImmAttrClassName", attrValueType = SA_IMM_ATTR_SASTRINGT, attrValue = 
0x7ffc651893b8}}
app = 0x5619ed8abc40
__FUNCTION__ = "avd_app_config_get"
error = SA_AIS_ERR_FAILED_OPERATION
rc = 
attributes = 0x5619ed89cab0
#7  0x5619ecc332d5 in avd_imm_config_get () at src/amf/amfd/imm.cc:1631
rc = 2
t_ = {trace_leave_called = false, file_ = 0x0, function_ = 0x0}
__FUNCTION__ = "avd_imm_config_get"
#8  0x

[tickets] [opensaf:tickets] #2412 log: coredump while finalize client in parallel

2017-04-05 Thread elunlen
- **Milestone**: future --> 5.2.0



---

** [tickets:#2412] log: coredump while finalize client in parallel**

**Status:** accepted
**Milestone:** 5.2.0
**Created:** Tue Apr 04, 2017 12:08 PM UTC by Canh Truong
**Last Updated:** Tue Apr 04, 2017 01:45 PM UTC
**Owner:** Canh Truong


Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x7f5bb0159d68 in lga_hdl_rec_del 
(list_head=list_head@entry=0x7f5bb0360288 , 
rm_node=rm_node@entry=0x7f5b740029e0) at src/log/agent/lga_util.c:651

When 2 threads do the same action finalize client, log agent remove the client 
in the list. in one thread, it access to the node in the list and this node 
maybe deleted by another thread. it causes the coredump happen.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets