[tickets] [opensaf:tickets] #1831 fm: Activation supervision is not started properly
- **status**: accepted --> review --- ** [tickets:#1831] fm: Activation supervision is not started properly** **Status:** review **Milestone:** 5.0.1 **Created:** Mon May 16, 2016 01:49 PM UTC by Anders Widell **Last Updated:** Mon May 16, 2016 01:49 PM UTC **Owner:** Anders Widell Two bugs have been discovered in the handling of the activation supervision timer: 1) There is a cut-and-paste bug where the conditions are checking the status of the promote active timer instead of the activation supervision timer 2) The timer is only started in the RDA callback, but not initially at startup after reading the RDA role --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1830 ncs_sel_obj_ind: write failed message observed in syslog during OpenSAF start
--- ** [tickets:#1830] ncs_sel_obj_ind: write failed message observed in syslog during OpenSAF start** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 16, 2016 12:36 PM UTC by Ritu Raj **Last Updated:** Mon May 16, 2016 12:36 PM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/1830/attachment/messages) (1.0 MB; application/octet-stream) - [osafckptd](https://sourceforge.net/p/opensaf/tickets/1830/attachment/osafckptd) (133.0 kB; application/octet-stream) - [osafckptnd](https://sourceforge.net/p/opensaf/tickets/1830/attachment/osafckptnd) (31.8 kB; application/octet-stream) Setup: Changeset- 7640 Version - opensaf 5.0 4 nodes cluster Issue observed: Started OpenSAF on Controller(SC-1) 'ncs_sel_obj_ind: write failed' message observed in syslog on starting OpenSAF May 16 18:11:26 SLES-64BIT-SLOT1 osafckptnd[19133]: Started May 16 18:11:26 SLES-64BIT-SLOT1 osafckptnd[19133]: **ncs_sel_obj_ind: write failed** - Bad file descriptor * CKPT traces and syslog attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1824 CLM: Opensaf restart failed due to saClmInitialize_4 returned 31
I am observing the issue with latest CS #7640 also. Traces are attached. Kindly refer. --- ** [tickets:#1824] CLM: Opensaf restart failed due to saClmInitialize_4 returned 31** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Fri May 13, 2016 07:33 AM UTC by Chani Srivastava **Last Updated:** Fri May 13, 2016 08:23 AM UTC **Owner:** nobody **Attachments:** - [logs.tar](https://sourceforge.net/p/opensaf/tickets/1824/attachment/logs.tar) (34.8 MB; application/octet-stream) Setup: Changeset- 7613 OS: SUSE 11SP2 x86_64 Steps to reproduce: 1. Install rpms and bring up opensaf on all nodes(4 nodes) 2. /etc/init.d/opensad restart on standby controller * Issue reproducible most of the time * Issue observed with 5.0 FC also * syslog and clm traces for active standby attached * issue not observed while doing opensaf stop and start (only in case of restart) Standby failed to join the cluster back with following log errors: May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 May 13 12:34:50 SLOT-2 osafclmna[31319]: NO safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f May 13 12:34:50 SLOT-2 osafrded[31328]: NO Got peer info request from node 0x2010f with role ACTIVE May 13 12:34:50 SLOT-2 osafrded[31328]: NO Got peer info response from node 0x2010f with role ACTIVE May 13 12:34:50 SLOT-2 osafrded[31328]: NO RDE role set to QUIESCED May 13 12:34:50 SLOT-2 osafrded[31328]: NO Giving up election against 0x2010f with role ACTIVE. My role is now QUIESCED May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO Fevs count adjusted to 1936 preLoadPid: 0 May 13 12:34:50 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> IMM_NODE_ISOLATED May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> IMM_NODE_W_AVAILABLE May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2866 May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO RepositoryInitModeT is SA_IMM_KEEP_REPOSITORY May 13 12:34:51 SLOT-2 osafimmnd[31358]: WA IMM Access Control mode is DISABLED! May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO Epoch set to 14 in ImmModel May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY May 13 12:34:51 SLOT-2 osafimmnd[31358]: NO ImmModel received scAbsenceAllowed 0 May 13 12:34:51 SLOT-2 osaflogd[31368]: Started May 13 12:34:51 SLOT-2 osafntfd[31378]: Started May 13 12:34:51 SLOT-2 osafclmd[31388]: Started May 13 12:34:51 SLOT-2 osafamfd[31398]: Started May 13 12:34:51 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:51 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:52 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 May 13 12:34:53 SLOT-2 osafamfd[31398]: WA saClmInitialize_4 returned 31 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1829 Cluster reset happened after performing shutdown operation on clm unlocked node
AMF traces and mds log of both the controller is attached Attachments: - [amf_SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/de5ca2fa/ffa4/attachment/amf_SC-1.tar.bz2) (492.1 kB; application/x-bzip) - [amf_SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/de5ca2fa/ffa4/attachment/amf_SC-2.tar.bz2) (1.1 MB; application/x-bzip) --- ** [tickets:#1829] Cluster reset happened after performing shutdown operation on clm unlocked node** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 16, 2016 10:16 AM UTC by Ritu Raj **Last Updated:** Mon May 16, 2016 10:16 AM UTC **Owner:** nobody **Attachments:** - [SLES-32BIT-SLOT2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-32BIT-SLOT2.tar.bz2) (4.1 MB; application/x-bzip) - [SLES-64BIT-SLOT1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-64BIT-SLOT1.tar.bz2) (2.5 MB; application/x-bzip) setup: Changeset- 7613 Version - opensaf 5.0 FC **Issue Observed**: Cluster reset happened after performing shutdown operation on clm unlocked node **Steps Performed**: 1. Cluster have 4 node setup, where SC-1 is active, SC-2 standby, PL-3 and PL-4 are respective payload 2. Spawned a CLM tracking agent on one of the node PL-4. Performed shutdown operation on PL-3 After, 30 sec delay in the track callback by CLM agent, shutdown operation got succeded and node moved to lock state. Active controller went for reboot SLES-64BIT-SLOT1:~ # immadm -o 3 safNode=PL-3,safCluster=myClmCluster May 16 15:04:32 SLES-64BIT-SLOT1 osafclmd[6345]: NO safNode=PL-3,safCluster=myClmCluster SHUTDOWN, view number=6 SLES-64BIT-SLOT1:~ # May 16 15:04:48 SLES-64BIT-SLOT1 osaffmd[6283]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE assignment received within the time limit, OwnNodeId = 131343, SupervisionTime = 60 3. and standby controller also rebooted due to AMF director heart beat timeout, causing both paylod went for reboot >>May 16 15:07:03 SLES-32BIT-SLOT2 osafamfnd[5805]: WA saClmInitialize_4 >>returned 5 May 16 15:07:44 SLES-32BIT-SLOT2 osafntfimcnd[5905]: WA ntfimcn_ntf_init saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5) May 16 15:07:52 SLES-32BIT-SLOT2 osafamfnd[5805]: ER AMF director heart beat timeout, generating core for amfd May 16 15:07:53 SLES-32BIT-SLOT2 osafamfnd[5805]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, SupervisionTime = 60 May 16 15:07:53 SLES-32BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60 * Syslog and clm traces of both the controller is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1829 Cluster reset happened after performing shutdown operation on clm unlocked node
--- ** [tickets:#1829] Cluster reset happened after performing shutdown operation on clm unlocked node** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Mon May 16, 2016 10:16 AM UTC by Ritu Raj **Last Updated:** Mon May 16, 2016 10:16 AM UTC **Owner:** nobody **Attachments:** - [SLES-32BIT-SLOT2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-32BIT-SLOT2.tar.bz2) (4.1 MB; application/x-bzip) - [SLES-64BIT-SLOT1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1829/attachment/SLES-64BIT-SLOT1.tar.bz2) (2.5 MB; application/x-bzip) setup: Changeset- 7613 Version - opensaf 5.0 FC **Issue Observed**: Cluster reset happened after performing shutdown operation on clm unlocked node **Steps Performed**: 1. Cluster have 4 node setup, where SC-1 is active, SC-2 standby, PL-3 and PL-4 are respective payload 2. Spawned a CLM tracking agent on one of the node PL-4. Performed shutdown operation on PL-3 After, 30 sec delay in the track callback by CLM agent, shutdown operation got succeded and node moved to lock state. Active controller went for reboot SLES-64BIT-SLOT1:~ # immadm -o 3 safNode=PL-3,safCluster=myClmCluster May 16 15:04:32 SLES-64BIT-SLOT1 osafclmd[6345]: NO safNode=PL-3,safCluster=myClmCluster SHUTDOWN, view number=6 SLES-64BIT-SLOT1:~ # May 16 15:04:48 SLES-64BIT-SLOT1 osaffmd[6283]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE assignment received within the time limit, OwnNodeId = 131343, SupervisionTime = 60 3. and standby controller also rebooted due to AMF director heart beat timeout, causing both paylod went for reboot >>May 16 15:07:03 SLES-32BIT-SLOT2 osafamfnd[5805]: WA saClmInitialize_4 >>returned 5 May 16 15:07:44 SLES-32BIT-SLOT2 osafntfimcnd[5905]: WA ntfimcn_ntf_init saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5) May 16 15:07:52 SLES-32BIT-SLOT2 osafamfnd[5805]: ER AMF director heart beat timeout, generating core for amfd May 16 15:07:53 SLES-32BIT-SLOT2 osafamfnd[5805]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, SupervisionTime = 60 May 16 15:07:53 SLES-32BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60 * Syslog and clm traces of both the controller is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] Re: #1828 AMF: Both director and node director hang if immnd dies in new SC reallocation scenario
It's a bit different from #517 which resolves deadlock between IMM & AMFND. In this scenario, amfnd has not get a chance to restart immnd, since amfnd is still hanging in mainthread due to clm reinit. After amfnd moves clm initialization to another thread, amfnd might need solution of #517 if deadlock with IMM happens --- ** [tickets:#1828] AMF: Both director and node director hang if immnd dies in new SC reallocation scenario** **Status:** assigned **Milestone:** 5.1.FC **Created:** Mon May 16, 2016 05:29 AM UTC by Minh Hon Chau **Last Updated:** Mon May 16, 2016 06:29 AM UTC **Owner:** Minh Hon Chau Enable cloud & roaming feature. If both Active and Standby SC are stopped at the same time, new controllers will be allocated to be Active/Standby. During this new Active role allocation, if immnd dies there will be circle dependencies in controller (who is going to be Active): - clmd can not use IMM services since immnd dies - immnd needs restarted by amfnd - amfnd is hanging since amfnd is calling CLM services - amfd is also hanging since amfd is calling CLM and NTF services - ntfd is hanging due to logd's dependencies on IMM The problem can be solved if amfd/amfnd are not blocked in main thread so immnd can be restarted and controller will not be reboot due to heartbeat time out --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1828 AMF: Both director and node director hang if immnd dies in new SC reallocation scenario
>>- immnd needs restarted by amfnd It is the same as #517. --- ** [tickets:#1828] AMF: Both director and node director hang if immnd dies in new SC reallocation scenario** **Status:** assigned **Milestone:** 5.1.FC **Created:** Mon May 16, 2016 05:29 AM UTC by Minh Hon Chau **Last Updated:** Mon May 16, 2016 05:30 AM UTC **Owner:** Minh Hon Chau Enable cloud & roaming feature. If both Active and Standby SC are stopped at the same time, new controllers will be allocated to be Active/Standby. During this new Active role allocation, if immnd dies there will be circle dependencies in controller (who is going to be Active): - clmd can not use IMM services since immnd dies - immnd needs restarted by amfnd - amfnd is hanging since amfnd is calling CLM services - amfd is also hanging since amfd is calling CLM and NTF services - ntfd is hanging due to logd's dependencies on IMM The problem can be solved if amfd/amfnd are not blocked in main thread so immnd can be restarted and controller will not be reboot due to heartbeat time out --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Mobile security can be enabling, not merely restricting. Employees who bring their own devices (BYOD) to work are irked by the imposition of MDM restrictions. Mobile Device Manager Plus allows you to control only the apps on BYO-devices by containerizing them, leaving personal data untouched! https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets