[tickets] [opensaf:tickets] #1831 fm: Activation supervision is not started properly

2016-05-16 Thread Anders Widell
- **status**: accepted --> review



---

** [tickets:#1831] fm: Activation supervision is not started properly**

**Status:** review
**Milestone:** 5.0.1
**Created:** Mon May 16, 2016 01:49 PM UTC by Anders Widell
**Last Updated:** Mon May 16, 2016 01:49 PM UTC
**Owner:** Anders Widell


Two bugs have been discovered in the handling of the activation supervision 
timer:

1) There is a cut-and-paste bug where the conditions are checking the status of 
the promote active timer instead of the activation supervision timer
2) The timer is only started in the RDA callback, but not initially at startup 
after reading the RDA role


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1830 ncs_sel_obj_ind: write failed message observed in syslog during OpenSAF start

2016-05-16 Thread Ritu Raj



---

** [tickets:#1830] ncs_sel_obj_ind: write failed message observed in syslog 
during OpenSAF start**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon May 16, 2016 12:36 PM UTC by Ritu Raj
**Last Updated:** Mon May 16, 2016 12:36 PM UTC
**Owner:** nobody
**Attachments:**

- 
[messages](https://sourceforge.net/p/opensaf/tickets/1830/attachment/messages) 
(1.0 MB; application/octet-stream)
- 
[osafckptd](https://sourceforge.net/p/opensaf/tickets/1830/attachment/osafckptd)
 (133.0 kB; application/octet-stream)
- 
[osafckptnd](https://sourceforge.net/p/opensaf/tickets/1830/attachment/osafckptnd)
 (31.8 kB; application/octet-stream)


Setup:
Changeset- 7640
Version - opensaf 5.0
4 nodes cluster

Issue observed:
Started OpenSAF on Controller(SC-1)
'ncs_sel_obj_ind: write failed' message observed in syslog on starting OpenSAF 

May 16 18:11:26 SLES-64BIT-SLOT1 osafckptnd[19133]: Started
May 16 18:11:26 SLES-64BIT-SLOT1 osafckptnd[19133]: **ncs_sel_obj_ind: write 
failed** - Bad file descriptor

* CKPT traces and syslog attached



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1824 CLM: Opensaf restart failed due to saClmInitialize_4 returned 31

2016-05-16 Thread Chani Srivastava
I am observing the issue with latest CS #7640 also. Traces are attached. Kindly 
refer.


---

** [tickets:#1824] CLM: Opensaf restart failed due to saClmInitialize_4 
returned 31**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Fri May 13, 2016 07:33 AM UTC by Chani Srivastava
**Last Updated:** Fri May 13, 2016 08:23 AM UTC
**Owner:** nobody
**Attachments:**

- 
[logs.tar](https://sourceforge.net/p/opensaf/tickets/1824/attachment/logs.tar) 
(34.8 MB; application/octet-stream)


Setup:
Changeset- 7613 
OS: SUSE 11SP2 x86_64

Steps to reproduce:
1. Install rpms and bring up opensaf on all nodes(4 nodes)
2. /etc/init.d/opensad restart on standby controller

* Issue reproducible most of the time
* Issue observed with 5.0 FC also
* syslog and clm traces for active standby attached
* issue not observed while doing opensaf stop and start (only in case of 
restart)


Standby failed to join the cluster back with following log errors:

May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
May 13 12:34:50 SLOT-2 osafclmna[31319]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
May 13 12:34:50 SLOT-2 osafrded[31328]: NO Got peer info request from node 
0x2010f with role ACTIVE
May 13 12:34:50 SLOT-2 osafrded[31328]: NO Got peer info response from node 
0x2010f with role ACTIVE
May 13 12:34:50 SLOT-2 osafrded[31328]: NO RDE role set to QUIESCED
May 13 12:34:50 SLOT-2 osafrded[31328]: NO Giving up election against 0x2010f 
with role ACTIVE. My role is now QUIESCED
May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO Fevs count adjusted to 1936 
preLoadPid: 0
May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> IMM_NODE_ISOLATED
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 2866
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO RepositoryInitModeT is 
SA_IMM_KEEP_REPOSITORY
May 13 12:34:51 SLOT-2 osafimmnd[31358]: WA IMM Access Control mode is DISABLED!
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO Epoch set to 14 in ImmModel
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY
May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO ImmModel received scAbsenceAllowed 0
May 13 12:34:51 SLOT-2 osaflogd[31368]: Started
May 13 12:34:51 SLOT-2 osafntfd[31378]: Started
May 13 12:34:51 SLOT-2 osafclmd[31388]: Started
May 13 12:34:51 SLOT-2 osafamfd[31398]: Started
May 13 12:34:51 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:51 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31
May 13 12:34:53 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1829 Cluster reset happened after performing shutdown operation on clm unlocked node

2016-05-16 Thread Ritu Raj
AMF traces and mds log of both the controller is attached


Attachments:

- 
[amf_SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/de5ca2fa/ffa4/attachment/amf_SC-1.tar.bz2)
 (492.1 kB; application/x-bzip)
- 
[amf_SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/de5ca2fa/ffa4/attachment/amf_SC-2.tar.bz2)
 (1.1 MB; application/x-bzip)


---

** [tickets:#1829] Cluster reset happened after performing shutdown operation 
on clm unlocked node**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon May 16, 2016 10:16 AM UTC by Ritu Raj
**Last Updated:** Mon May 16, 2016 10:16 AM UTC
**Owner:** nobody
**Attachments:**

- 
[SLES-32BIT-SLOT2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-32BIT-SLOT2.tar.bz2)
 (4.1 MB; application/x-bzip)
- 
[SLES-64BIT-SLOT1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-64BIT-SLOT1.tar.bz2)
 (2.5 MB; application/x-bzip)


setup:
Changeset- 7613
Version - opensaf 5.0 FC

 **Issue Observed**:
Cluster reset happened after performing shutdown operation on clm unlocked node

**Steps Performed**:
1. Cluster have 4 node setup, where SC-1 is active, SC-2 standby, PL-3 and PL-4 
are respective payload

2. Spawned a CLM tracking agent on one of the node PL-4. Performed shutdown 
operation on PL-3
After, 30 sec delay in the track callback by CLM agent,  shutdown operation got 
succeded and node moved to lock state.

 Active controller went for reboot

SLES-64BIT-SLOT1:~ # immadm -o 3 safNode=PL-3,safCluster=myClmCluster
May 16 15:04:32 SLES-64BIT-SLOT1 osafclmd[6345]: NO 
safNode=PL-3,safCluster=myClmCluster SHUTDOWN, view number=6
SLES-64BIT-SLOT1:~ # May 16 15:04:48 SLES-64BIT-SLOT1 osaffmd[6283]: Rebooting 
OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision 
expired: no ACTIVE assignment received within the time limit, OwnNodeId = 
131343, SupervisionTime = 60

3. and standby controller also rebooted due to AMF director heart beat timeout, 
causing both paylod went for reboot

>>May 16 15:07:03 SLES-32BIT-SLOT2 osafamfnd[5805]: WA saClmInitialize_4 
>>returned 5
May 16 15:07:44 SLES-32BIT-SLOT2 osafntfimcnd[5905]: WA ntfimcn_ntf_init 
saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5)
May 16 15:07:52 SLES-32BIT-SLOT2 osafamfnd[5805]: ER AMF director heart beat 
timeout, generating core for amfd
May 16 15:07:53 SLES-32BIT-SLOT2 osafamfnd[5805]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, 
SupervisionTime = 60
May 16 15:07:53 SLES-32BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60


* Syslog and clm traces of both the controller is attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1829 Cluster reset happened after performing shutdown operation on clm unlocked node

2016-05-16 Thread Ritu Raj



---

** [tickets:#1829] Cluster reset happened after performing shutdown operation 
on clm unlocked node**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Mon May 16, 2016 10:16 AM UTC by Ritu Raj
**Last Updated:** Mon May 16, 2016 10:16 AM UTC
**Owner:** nobody
**Attachments:**

- 
[SLES-32BIT-SLOT2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-32BIT-SLOT2.tar.bz2)
 (4.1 MB; application/x-bzip)
- 
[SLES-64BIT-SLOT1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-64BIT-SLOT1.tar.bz2)
 (2.5 MB; application/x-bzip)


setup:
Changeset- 7613
Version - opensaf 5.0 FC

 **Issue Observed**:
Cluster reset happened after performing shutdown operation on clm unlocked node

**Steps Performed**:
1. Cluster have 4 node setup, where SC-1 is active, SC-2 standby, PL-3 and PL-4 
are respective payload

2. Spawned a CLM tracking agent on one of the node PL-4. Performed shutdown 
operation on PL-3
After, 30 sec delay in the track callback by CLM agent,  shutdown operation got 
succeded and node moved to lock state.

 Active controller went for reboot

SLES-64BIT-SLOT1:~ # immadm -o 3 safNode=PL-3,safCluster=myClmCluster
May 16 15:04:32 SLES-64BIT-SLOT1 osafclmd[6345]: NO 
safNode=PL-3,safCluster=myClmCluster SHUTDOWN, view number=6
SLES-64BIT-SLOT1:~ # May 16 15:04:48 SLES-64BIT-SLOT1 osaffmd[6283]: Rebooting 
OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision 
expired: no ACTIVE assignment received within the time limit, OwnNodeId = 
131343, SupervisionTime = 60

3. and standby controller also rebooted due to AMF director heart beat timeout, 
causing both paylod went for reboot

>>May 16 15:07:03 SLES-32BIT-SLOT2 osafamfnd[5805]: WA saClmInitialize_4 
>>returned 5
May 16 15:07:44 SLES-32BIT-SLOT2 osafntfimcnd[5905]: WA ntfimcn_ntf_init 
saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5)
May 16 15:07:52 SLES-32BIT-SLOT2 osafamfnd[5805]: ER AMF director heart beat 
timeout, generating core for amfd
May 16 15:07:53 SLES-32BIT-SLOT2 osafamfnd[5805]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, 
SupervisionTime = 60
May 16 15:07:53 SLES-32BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60


* Syslog and clm traces of both the controller is attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #1828 AMF: Both director and node director hang if immnd dies in new SC reallocation scenario

2016-05-16 Thread Minh Hon Chau
It's a bit different from #517 which resolves deadlock between IMM & AMFND.
In this scenario, amfnd has not get a chance to restart immnd, since amfnd is 
still hanging in mainthread due to clm reinit. After amfnd moves clm 
initialization to another thread, amfnd might need solution of #517 if deadlock 
with IMM happens


---

** [tickets:#1828] AMF: Both director and node director hang if immnd dies in 
new SC reallocation scenario**

**Status:** assigned
**Milestone:** 5.1.FC
**Created:** Mon May 16, 2016 05:29 AM UTC by Minh Hon Chau
**Last Updated:** Mon May 16, 2016 06:29 AM UTC
**Owner:** Minh Hon Chau


Enable cloud & roaming feature.
If both Active and Standby SC are stopped at the same time, new controllers 
will be allocated to be Active/Standby. During this new Active role allocation, 
if immnd dies there will be circle dependencies in controller (who is going to 
be Active):
- clmd can not use IMM services since immnd dies
- immnd needs restarted by amfnd
- amfnd is hanging since amfnd is calling CLM services
- amfd is also hanging since amfd is calling CLM and NTF services
- ntfd is hanging due to logd's dependencies on IMM

The problem can be solved if amfd/amfnd are not blocked in main thread so immnd 
can be restarted and controller will not be reboot due to heartbeat time out


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1828 AMF: Both director and node director hang if immnd dies in new SC reallocation scenario

2016-05-16 Thread Nagendra Kumar
>>- immnd needs restarted by amfnd
It is the same as #517.


---

** [tickets:#1828] AMF: Both director and node director hang if immnd dies in 
new SC reallocation scenario**

**Status:** assigned
**Milestone:** 5.1.FC
**Created:** Mon May 16, 2016 05:29 AM UTC by Minh Hon Chau
**Last Updated:** Mon May 16, 2016 05:30 AM UTC
**Owner:** Minh Hon Chau


Enable cloud & roaming feature.
If both Active and Standby SC are stopped at the same time, new controllers 
will be allocated to be Active/Standby. During this new Active role allocation, 
if immnd dies there will be circle dependencies in controller (who is going to 
be Active):
- clmd can not use IMM services since immnd dies
- immnd needs restarted by amfnd
- amfnd is hanging since amfnd is calling CLM services
- amfd is also hanging since amfd is calling CLM and NTF services
- ntfd is hanging due to logd's dependencies on IMM

The problem can be solved if amfd/amfnd are not blocked in main thread so immnd 
can be restarted and controller will not be reboot due to heartbeat time out


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets