[tickets] [opensaf:tickets] #1581 pyosaf: Make log level configurable in the SafLogger utility class
--- ** [tickets:#1581] pyosaf: Make log level configurable in the SafLogger utility class** **Status:** assigned **Milestone:** 5.0.FC **Created:** Mon Nov 02, 2015 03:08 PM UTC by Johan Mårtensson **Last Updated:** Mon Nov 02, 2015 03:08 PM UTC **Owner:** Johan Mårtensson In the SafLogger::log method the log level is hard-coded to notice. This should be fixed so that it's configurable. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1527 log: terminated due to use SaImmOiHandleT concurrently from 02 threads
- **status**: unassigned --> assigned - **assigned_to**: Vu Minh Nguyen --- ** [tickets:#1527] log: terminated due to use SaImmOiHandleT concurrently from 02 threads** **Status:** assigned **Milestone:** 5.0.FC **Created:** Wed Oct 07, 2015 10:59 AM UTC by Vu Minh Nguyen **Last Updated:** Sun Nov 01, 2015 09:36 PM UTC **Owner:** Vu Minh Nguyen When standby takes active role, "new" active logsv starts one thread `imm_impl_restore_thread` to set OI implementer for LOG service. In the meantime, the main thread is still there, ready to receive any coming requests. So, the picture here is there are 02 threads using one OiHandle concurrently - `imm_impl_restore_thread` and main thread. It violates the IMM rule, stated in IMM PR doc, `"the developer must avoid using the same handle concurrently from several threads."` In the trace log below, there are 02 problems caused by using OiHandle in 02 different threads: 1) Get `ERR_BAD_OPERATION` as do request to IMM while no implementer have been set. > Sep 17 18:22:04 SC-2 osaflogd[15047]: NO ACTIVE request Sep 17 18:22:04 SC-2 osaflogd[15047]: ER ERR_BAD_OPERATION: The SaImmOiHandleT is not associated with any implementer name ... > Sep 17 18:22:04 SC-2 osafimmnd[15026]: NO Implementer connected: 211 > (safLogService) <7, 2020f> 2) Get `ERR_LIBRARY` as double LOCK from IMM side, logsv terminated. > Sep 17 20:07:59 SC-2 osafimmnd[14962]: NO Implementer connected: 401 > (safLogService) <7, 2020f> ... Sep 17 20:07:59 SC-2 osaflogd[14975]: saImmOiClassImplementerSet FAILED, rc = 2 …. Sep 17 20:08:09 SC-2 osafamfnd[15047]: NO 'safComp=LOG,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 17 20:08:09 SC-2 osafamfnd[15047]: ER safComp=LOG,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1582 smfd: IMMA_SYNCR_TIMEOUT extended to 5 minutes
- **status**: assigned --> review --- ** [tickets:#1582] smfd: IMMA_SYNCR_TIMEOUT extended to 5 minutes** **Status:** review **Milestone:** 4.6.2 **Created:** Tue Nov 03, 2015 06:39 AM UTC by Ingvar Bergström **Last Updated:** Tue Nov 03, 2015 06:39 AM UTC **Owner:** Ingvar Bergström Heavily overloaded systems cause smfd to receive TIMEOUT from IMM. The IMMA_SYNCR_TIMEOUT timeout is extended to from one to five minutes. Handling of TIMEOUT from IMM is corrected in smf OI. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1582 smfd: IMMA_SYNCR_TIMEOUT extended to 5 minutes
--- ** [tickets:#1582] smfd: IMMA_SYNCR_TIMEOUT extended to 5 minutes** **Status:** assigned **Milestone:** 4.6.2 **Created:** Tue Nov 03, 2015 06:39 AM UTC by Ingvar Bergström **Last Updated:** Tue Nov 03, 2015 06:39 AM UTC **Owner:** Ingvar Bergström Heavily overloaded systems cause smfd to receive TIMEOUT from IMM. The IMMA_SYNCR_TIMEOUT timeout is extended to from one to five minutes. Handling of TIMEOUT from IMM is corrected in smf OI. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1566 Cluster reset happened during switchover due to AMF director heart beat timeout.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1566] Cluster reset happened during switchover due to AMF director heart beat timeout.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sat Oct 24, 2015 06:25 AM UTC by Ritu Raj **Last Updated:** Sat Oct 24, 2015 06:29 AM UTC **Owner:** nobody Changeset: 6901 70 nodes configured with PBE Application: Nway configured on all the nodes Issues Observed: > Cluster reset happened during switchover due to AMF director heart beat > timeout. Steps Performed: * AMF (Nway) application brought up on the nodes. * Some operations are performed on Nway application hosted on PL-65 to PL-68. * Stopped opensaf on the nodes PL-65 to PL-68. * Two switchover performed on Cluster. First switchover succeded without any issue. During second switchover old standby controller (SC-2) rebooted when it is being promoted to ACTIVE state. Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmd[2505]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (2020f) Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmnd[2516]: NO Implementer (applier) connected: 130 (@OpenSafImmReplicatorA) <10675, 2020f> Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: ER safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Oct 22 15:45:10 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60 * After SC-2 went for reboot, SC-1 tried to become active during witch AMF director heart beat timeout and cluster reset happened. Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfd[2557]: NO 'safRankedSu=safSu=dummy_NWay_1Norm_4\,safSg=SG_dummy_n\,safApp=N_6,safSi=dummy_NWay_1Norm_6,safApp=N_6' Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfnd[2567]: ER AMF director heart beat timeout, generating core for amfd Oct 22 15:54:54 SLES-64BIT-SLOT1 osafamfnd[2567]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, SupervisionTime = 60 Oct 22 15:54:54 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60 Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA MDS Send Failed Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA Error code 2 returned for message type 16 - ignoring Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: NO Implementer locally disconnected. Marking it as doomed 136 <4871, 2010f> (@safAmfService2010f) * Traces are not availbale --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1563 AMF : SU should not be instantiated if any one of hosted NG is is in lock-in state
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1563] AMF : SU should not be instantiated if any one of hosted NG is is in lock-in state** **Status:** review **Milestone:** 4.6.2 **Created:** Fri Oct 23, 2015 02:13 PM UTC by Srikanth R **Last Updated:** Fri Oct 30, 2015 01:18 PM UTC **Owner:** Nagendra Kumar Changeset : 6901 Application : 2N Issue : SU should not be instantiated if any one of hosted NG is is in lock-in state Initialy both the NGs are brought to locked-in state. SYSTEST-PLD-1:/opt/goahead/tetware/opensaffire/suites/avsv/infra # amf-state ng safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster saAmfNGAdminState=UNLOCKED(1) safAmfNodeGroup=PLs,safAmfCluster=myAmfCluster saAmfNGAdminState=LOCKED-INSTANTIATION(3) safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster saAmfNGAdminState=LOCKED-INSTANTIATION(3) On one of the locked-in NGs, if unlock-in operation is performed, the SUs hosted on PL should not be instantiated, as still the other NG is in lock-in state. The node should not be moved to LOCKED state SYSTEST-PLD-1:/opt/goahead/tetware/opensaffire/suites/avsv/infra # amf-adm unlock-in safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=PL-3,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=PL-4,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=PL-5,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=PL-6,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=SC-1,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safAmfNode=SC-2,safAmfCluster=myAmfCluster LOCKEDINSTANTIATION --> LOCKED safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN INSTANTIATING --> INSTANTIATED safSu=TestApp_SU3,safSg=TestApp_SG1,safApp=TestApp_TwoN INSTANTIATING --> INSTANTIATED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1567 AMF : Locked-in node should be moved to ENABLED state, during CLM node unlock
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1567] AMF : Locked-in node should be moved to ENABLED state, during CLM node unlock** **Status:** assigned **Milestone:** 4.6.2 **Created:** Sat Oct 24, 2015 11:01 AM UTC by Srikanth R **Last Updated:** Fri Oct 30, 2015 11:58 AM UTC **Owner:** Nagendra Kumar Changeset : 6901 Application : hosted 2n, no red application on PL-3 Steps : 1) Perform CLM lock operation on PL-3. AMF DN PL-3 moved to DISABLED state. 2) Perform lock operation on NG consisting of PL-3 3) Perform lock-inst operation on the same NG. Now AMF node PL-3 state shall be DISABLED & LOCKED-IN 4) Perform CLM unlock operation on PL-3. AMF DN PL-3 should be moved back to ENABLED state, but instead AMF DN is in DISABLED state. Further unlock-in operations on NG are not instantiating the SUs. AMF should update the Locked-in node state to ENABLED state, during CLM node unlock --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1532 AMF : SI should be reverted to unlocked state, after shutdown operation of SI is rejected
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1532] AMF : SI should be reverted to unlocked state, after shutdown operation of SI is rejected** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Oct 08, 2015 11:20 AM UTC by Srikanth R **Last Updated:** Mon Oct 19, 2015 09:15 AM UTC **Owner:** Nagendra Kumar Changeset : 6901 Application : 2n ( two SUs and 4 SIs with SI1 as sponsor for the remaining SIs) Steps : * Initially all the SIs are in assigned state. * Invoked shutdown operation on one of the dependent SI .i.e SI2. * For the quiescing callback, component responded with FAILED_OP Oct 8 16:27:20 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCING to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Performing failover of 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' (SU failover count: 2) Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO 'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackTimeout' : Recovery is 'componentFailover' * After recovery of SU1, SI2 assignments are also done, which is not expected. Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State TERMINATING => INSTANTIATED Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 8 16:27:30 SYSTEST-PLD-1 osafamfnd[4535]: NO Assigning 'safSi=TestApp_SI3,safApp=TestApp_TwoN' STANDBY to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' * Below is the SI state after the shutdown operation safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=FULLY_ASSIGNED(2) * Further unlock operation of SI resulted in TIMEOUT return op. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1446 log: trouble when the number of existing app streams reachs limitation
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1446] log: trouble when the number of existing app streams reachs limitation** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Aug 13, 2015 10:10 AM UTC by Vu Minh Nguyen **Last Updated:** Thu Aug 13, 2015 10:10 AM UTC **Owner:** nobody When creating an configurable app stream (e.g: using `immcfg –c`). Suppose all inputs are valid. In this case, logsv returns `SA_AIS_OK` to IMM for its callbacks. Means IMM is allowed to creates its database/resource for this obj (*1*). In apply callback, IMM asks logsv to apply the change – not require acknowledge. If number of app streams has reached the limitation - defined by `logMaxApplicationStreams`, logsv will get failed to add this stream to stream_array. As the result, logsv deletes all allocated resources managed by itself. The created resources in step (*1*) is still existing. And it causes things as below – see my comments in right side: 1. Create obj successfully from IMM. But actually, logsv gets failed at ccbApplyCallback > immcfg -c SaLogStreamConfig safLgStrCfg=test6 -a saLogStreamPathName=. -a > saLogStreamFileName=test6 2. immlist failed as logsv returns not ok `no such obj` to IMM. > immlist safLgStrCfg=test6 error - saImmOmAccessorGet_2 FAILED: SA_AIS_ERR_NO_RESOURCES (18) 3. Create obj failed as the resource is existing > immcfg -c SaLogStreamConfig safLgStrCfg=test6 -a saLogStreamPathName=. -a > saLogStreamFileName=test6 error - saImmOmCcbObjectCreate_2 FAILED with SA_AIS_ERR_EXIST (14) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1464 Cluster reset triggered, after middleware si-swap ( one of controller in disabled )
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1464] Cluster reset triggered, after middleware si-swap ( one of controller in disabled )** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Aug 28, 2015 10:00 AM UTC by Srikanth R **Last Updated:** Fri Aug 28, 2015 10:00 AM UTC **Owner:** nobody **Attachments:** - [clusterReset.tgz](https://sourceforge.net/p/opensaf/tickets/1464/attachment/clusterReset.tgz) (5.4 MB; application/x-compressed) *Setup* 4.7M0 with changeset 6770 4 nodes configured with no PBE configured and 2N application hosted. SC-1 is active controller and SC-2 is standby controller and both the controllers are hosting application SUs configured with 2N redundancy model. *Issues* Cluster went for reset, for the si-swap operation on middleware. The active controller is in disabled state, before invoking si-swap operation. *Steps Performed* -> Because of faulty application, SC-1 moved to disabled state. NodeAutorepair feature is disabled for SC-1. Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfnd[4650]: NO 'safComp=COMP3SU1TWONAPP,safSu=SU1,safSg=SGONE,safApp=TWONAPP' faulted due to 'csiSetcallbackTimeout' : Recovery is 'nodeFailover' Aug 28 15:03:17 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered -> Invoked si-swap operation on middleware SI. -> Standby controller ( SC-2) got rebooted, as implementer set failed with ERR_EXIST . Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Aug 28 15:03:32 SYSTEST-CNTLR-2 osafntfimcnd[4726]: NO exiting on signal 15 Aug 28 15:03:32 SYSTEST-CNTLR-2 osafimmd[4686]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (2010f) Aug 28 15:03:32 SYSTEST-CNTLR-2 osafmsgd[4882]: ER mqd_imm_declare_implementer failed: err = 14 Aug 28 15:03:32 SYSTEST-CNTLR-2 osaflogd[4707]: ER saImmOiClassImplementerSet (safLogService) failed: 14 Aug 28 15:03:32 SYSTEST-CNTLR-2 osafckptd[4780]: ER cpd immOiImplmenterSet failed with err = 14 Aug 28 15:03:32 SYSTEST-CNTLR-2 osafamfnd[4761]: NO 'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> SC-1 also got rebooted, after SC-2 reboot. Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO Node 'SC-2' left the cluster Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: WA State change notification lost for 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: ER Failed to start cluster tracking 6 Aug 28 15:03:58 SYSTEST-CNTLR-1 osafamfd[4640]: NO NodeAutorepair disabled for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster', no reboot ordered Aug 28 15:03:58 SYSTEST-CNTLR-1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Aug 28 15:04:03 SYSTEST-CNTLR-1 osafclmd[4621]: ER saNtfNotificationSend() returned: SA_AIS_ERR_TRY_AGAIN (6) Aug 28 15:04:08 SYSTEST-CNTLR-1 osaflogd[4596]: WA saImmOiRtObjectDelete returned 5 for safLgStr=TWONLOGSTREAM Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use is blocked by pending reply on syncronous call Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally disconnected. Marking it as doomed 4 <17, 2010f> (safAmfService) Aug 28 15:04:08 SYSTEST-CNTLR-1 osafamfd[4640]: NO Re-initializing with IMM Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get Failed for cli_hdl 73014575375 Aug 28 15:04:08 SYSTEST-CNTLR-1 osafimmnd[4583]: WA Timeout on syncronous admin operation 1 Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA ERR_BAD_HANDLE: Handle use is blocked by pending reply on syncronous call Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: NO Implementer locally disconnected. Marking it as doomed 3 <12, 2010f> (safClmService) Aug 28 15:04:13 SYSTEST-CNTLR-1 osafimmnd[4583]: WA IMMND - Client Node Get Failed for cli_hdl 51539738895 Aug 28 15:04:22 SYSTEST-CNTLR-1 osafclmd[4621]: ER saImmOiImplementerSet failed rc:6, exiting Aug 28 15:04:22 SYSTEST-CNTLR-1 osafamfnd[4650]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' -> As both the controllers went for reboot, payloads went for reboot. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1463 log: output a redundant quotation mark in log fileWhe
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1463] log: output a redundant quotation mark in log fileWhe** **Status:** accepted **Milestone:** 4.6.2 **Created:** Fri Aug 28, 2015 09:57 AM UTC by Vu Minh Nguyen **Last Updated:** Wed Sep 23, 2015 09:27 AM UTC **Owner:** Vu Minh Nguyen When sending a log record which is longer than `saLogStreamFixedLogRecordSize` value, there will be a redundant double quotation mark in log file. Only happens in case of using token `@Cb` without double quotations around it (`@Cb` not `"@Cb"`). Here is an example of log file: >$ cat saLogAlarm_20150828_073826.log 11 0x13fe8cf840b38008 0x13fe8cf83f451b28 0x4003 T saflogger.3881@SC-1 saflogger.3881@SC-1 11" --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1353 smf: two hours is spent on step undoing state
- **Milestone**: 4.6.1 --> 4.6.2 --- ** [tickets:#1353] smf: two hours is spent on step undoing state ** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 28, 2015 01:33 PM UTC by Neelakanta Reddy **Last Updated:** Wed Jul 15, 2015 01:03 PM UTC **Owner:** nobody **Attachments:** - [messages_step_undo](https://sourceforge.net/p/opensaf/tickets/1353/attachment/messages_step_undo) (111.1 kB; application/octet-stream) Test description: 1. rolling middle-ware upgrade(4.5->4.6) campaign is ran 2. one of the upgrade node(PL-4) the new rpms(4.6) are kept empty and the node comes up without opensaf installation 3. the step rollback is taken approximately two hours to describe the campaign as EXECUTION_FAILED 4. attaching syslog of SC-1 Apr 24 18:36:55 SLES1 osafamfd[2289]: NO Node 'PL-4' left the cluster Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer connected: 47 (MsgQueueService132111) <2280, 2010f> Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer locally disconnected. Marking it as doomed 47 <2280, 2010f> (MsgQueueService132111) Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer disconnected 47 <2280, 2010f> (MsgQueueService132111) Apr 24 18:36:58 SLES1 kernel: [ 172.812065] TIPC: Resetting link <1.1.1:eth0-1.1.4:eth0>, peer not responding Apr 24 18:36:58 SLES1 kernel: [ 172.812071] TIPC: Lost link <1.1.1:eth0-1.1.4:eth0> on network plane A Apr 24 18:36:58 SLES1 kernel: [ 172.812075] TIPC: Lost contact with <1.1.4> Apr 24 18:37:15 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 18:37:36 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster --- -- -- Apr 24 20:36:00 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 20:36:22 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 20:36:44 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 20:37:06 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node safNode=PL-4,safCluster=myClmCluster Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found whitin time limit for node safAmfNode=PL-4,safAmfCluster=myAmfCluster Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found for node safAmfNode=PL-4,safAmfCluster=myAmfCluster Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Failed to online install old bundles Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Step undoing failed Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Step safSmfStep=0004 in procedure safSmfProc=OpenSAF-upgrade failed, step result 5 Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO CAMP: Procedure safSmfProc=OpenSAF-upgrade returned FAILED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1362 AMF: saAmfSGNumCurrAssignedSUs is not updated for operations performed on SG.
- **Milestone**: 4.6.1 --> 4.6.2 --- ** [tickets:#1362] AMF: saAmfSGNumCurrAssignedSUs is not updated for operations performed on SG.** **Status:** accepted **Milestone:** 4.6.2 **Created:** Thu Apr 30, 2015 09:22 AM UTC by Srikanth R **Last Updated:** Tue May 05, 2015 05:37 AM UTC **Owner:** Praveen Changeset : 6490 ISSUE : saAmfSGNumCurrAssignedSUs is not updated for operations performed on SG. For the SG in lock-in / locked state, saAmfSGNumCurrAssignedSUs is not changed to the value 0. This attribute is updated, for operations performed on SU. SOLO:/opt/goahead/tetware/opensaffire/suites/avsv/framework # immlist safSg=SG,safApp=test2nApp Name Type Value(s) safSg SA_STRING_T safSg=SG saAmfSGTypeSA_NAME_T safVersion=4.0.0,safSgType=test2nSgType (39) saAmfSGSuRestartProb SA_TIME_T saAmfSGSuRestartMaxSA_UINT32_T saAmfSGSuHostNodeGroup SA_NAME_T safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster (51) saAmfSGNumPrefStandbySUs SA_UINT32_T 1 (0x1) saAmfSGNumPrefInserviceSUs SA_UINT32_T 3 (0x3) saAmfSGNumPrefAssignedSUs SA_UINT32_T 3 (0x3) saAmfSGNumPrefActiveSUsSA_UINT32_T 1 (0x1) saAmfSGNumCurrNonInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrAssignedSUs SA_UINT32_T 2 (0x2) saAmfSGMaxStandbySIsperSU SA_UINT32_T 1 (0x1) saAmfSGMaxActiveSIsperSU SA_UINT32_T 1 (0x1) saAmfSGCompRestartProb SA_TIME_T saAmfSGCompRestartMax SA_UINT32_T saAmfSGAutoRepair SA_UINT32_T 1 (0x1) saAmfSGAutoAdjustProb SA_TIME_T saAmfSGAutoAdjust SA_UINT32_T 0 (0x0) saAmfSGAdminState SA_UINT32_T 3 (0x3) SaImmAttrImplementerName SA_STRING_T safAmfService SaImmAttrClassName SA_STRING_T SaAmfSG SaImmAttrAdminOwnerNameSA_STRING_T --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1291 IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync
- **Milestone**: 4.5.2 --> never --- ** [tickets:#1291] IMM: IMMD healthcheck callback timeout when standby controller rebooted in middle of IMMND sync** **Status:** invalid **Milestone:** never **Created:** Mon Mar 30, 2015 07:21 AM UTC by Sirisha Alla **Last Updated:** Mon Sep 21, 2015 06:35 AM UTC **Owner:** nobody **Attachments:** - [immlogs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1291/attachment/immlogs.tar.bz2) (6.8 MB; application/x-bzip) The issue is observed with 4.6 FC changeset 6377. The system is up and running with single pbe and 50k objects. This issue is seen after http://sourceforge.net/p/opensaf/tickets/1290 is observed. IMM application is running on standby controller and immcfg command is run from payload to set CompRestartMax value to 1000. IMMND is killed twice on standby controller leading to #1290. As a result, standby controller left the cluster in middle of sync, IMMD reported healthcheck callback timeout and the active controller too went for reboot. Following is the syslog of SC-1: Mar 26 14:58:17 SLES-64BIT-SLOT1 osafimmloadd: NO Sync starting Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Node Down event for node id 2020f: Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: NO Current role: ACTIVE Mar 26 14:58:28 SLES-64BIT-SLOT1 osaffmd[9529]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, SupervisionTime = 60 Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412080] TIPC: Resetting link <1.1.1:eth0-1.1.2:eth0>, peer not responding Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.412089] TIPC: Lost link <1.1.1:eth0-1.1.2:eth0> on network plane A Mar 26 14:58:28 SLES-64BIT-SLOT1 kernel: [15200.413191] TIPC: Lost contact with <1.1.2> Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:28 SLES-64BIT-SLOT1 osafclmd[9609]: NO Node 131599 went down. Not sending track callback for agents on that node Mar 26 14:58:30 SLES-64BIT-SLOT1 osafamfd[9628]: NO Node 'SC-2' left the cluster Mar 26 14:58:30 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Mar 26 14:58:54 SLES-64BIT-SLOT1 kernel: [15226.674333] TIPC: Established link <1.1.1:eth0-1.1.2:eth0> on network plane A Mar 26 15:00:02 SLES-64BIT-SLOT1 syslog-ng[3261]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=2197', processed='center(received)=1172', processed='destination(messages)=1172', processed='destination(mailinfo)=0', processed='destination(mailwarn)=0', processed='destination(localmessages)=955', processed='destination(newserr)=0', processed='destination(mailerr)=0', processed='destination(netmgm)=0', processed='destination(warn)=44', processed='destination(console)=13', processed='destination(null)=0', processed='destination(mail)=0', processed='destination(xconsole)=13', processed='destination(firewall)=0', processed='destination(acpid)=0', processed='destination(newscrit)=0', processed='destination(newsnotice)=0', processed='source(src)=1172' Mar 26 15:00:07 SLES-64BIT-SLOT1 osafimmloadd: ER Too many TRY_AGAIN on saImmOmSearchNext - aborting Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: ER SYNC APPARENTLY FAILED status:1 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO -SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE (2484) Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Epoch set to 12 in ImmModel Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmnd[9549]: NO Coord broadcasting ABORT_SYNC, epoch:12 Mar 26 15:00:08 SLES-64BIT-SLOT1 osafimmpbed: NO Update epoch 12 committing with ccbId:10054/4294967380 Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO SU failover probation timer started (timeout: 12000 ns) Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO Performing failover of 'safSu=SC-1,safSg=2N,safApp=OpenSAF' (SU failover count: 1) Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' recovery action escalated from 'componentFailover' to 'suFailover' Mar 26 15:01:34 SLES-64BIT-SLOT1 osafamfnd[9638]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' Mar
[tickets] [opensaf:tickets] #1285 MDS TCP: zero bytes recvd results in application exit
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1285] MDS TCP: zero bytes recvd results in application exit** **Status:** assigned **Milestone:** 4.6.2 **Created:** Thu Mar 26, 2015 09:49 AM UTC by Girish **Last Updated:** Tue Aug 11, 2015 06:26 AM UTC **Owner:** A V Mahesh (AVM) sometimes application using opensaf exits with below message: Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err :Success Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safSu=SU1,safSg=app-simplex,safApp=appos' component restart probation timer started (timeout: 40 ns) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of 'safSu=SU1,safSg=app-simplex,safApp=appos' (comp restart count: 1) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safComp=App,safSu=SU1,safSg=app-simplex,safApp=appos' faulted due to 'avaDown' : Recovery is 'componentRestart' Exits at location osaf/libs/core/mds/mds_dt_trans.c::mdtm_process_poll_recv_data_tcp recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->buffer, local_len_buf, 0); if (recd_bytes < 0) { return; } else if (0 == recd_bytes) { syslog(LOG_ERR, "MDTM:socket_recv() = %d, conn lost with dh server, exiting library err :%d len:%d", recd_bytes, errno, local_len_buf); close(tcp_cb->DBSRsock); exit(0); } else if (local_len_buf > recd_bytes) { local_len_buf turns out be 0 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1275 AMF: SG is in unstable state ( standby csi removal timeout during sponsor si lock )
- **Milestone**: 4.6.1 --> 4.6.2 --- ** [tickets:#1275] AMF: SG is in unstable state ( standby csi removal timeout during sponsor si lock )** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Mar 19, 2015 01:48 PM UTC by Srikanth R **Last Updated:** Wed Jul 15, 2015 01:08 PM UTC **Owner:** nobody *Setup* Version : 4.6 FC model : 2n configuration : 1App,1SG,2SUs with 4comps each, 4SIs with 1 CSI each si-si deps configured as SI1 is sponsor to SI2,3,&4. SU1 is mapped to pl-3 and SU2 to pl-4 saAmfSGAutoRepair=1(True) SuFailover=0(False) component recovery policy - 3 (comp failover) *Initial state* All the AMF entities regarding the application are in unlocked states. SIs are in fully assigned state. *Issue* SG is in unstable state ( standby csi removal timeout during sponsor si lock ) *Steps Performed* -> Before performing lock operation of sponsor SI, ensured that component 1 in SU2 ( the standby SU) does not respond in CSI removal callback. -> SG went to unstable state, after the lock operation of sponsor SI. Below are the logs on PL-4 ( where standby SU is hosted ) : Mar 19 19:05:11 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 'safSi=SI1,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp' Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: NO Removed 'safSi=SI2,safApp=test2nApp' from 'safSu=SU2,safSg=SG,safApp=test2nApp' Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI3,safApp=test2nApp Mar 19 19:05:21 SYSTEST-PLD-2 osafamfnd[24560]: CR SU-SI record addition failed, SU= safSu=SU2,safSg=SG,safApp=test2nApp : SI=safSi=SI4,safApp=test2nApp Below is the final state of SIs after the lock operation. safSi=SI1,safApp=test2nApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=SI2,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=SI3,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) safSi=SI4,safApp=test2nApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=PARTIALLY_ASSIGNED(3) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #538 AMF: fail-over assignments despite comps in TERM-FAILED state
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#538] AMF: fail-over assignments despite comps in TERM-FAILED state** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Aug 09, 2013 06:43 AM UTC by Hans Feldt **Last Updated:** Wed Jul 15, 2015 01:53 PM UTC **Owner:** nobody AMF currently performs fail-over recovery action although a component is in termination-failed presence state. This can lead to severe inconsistencies for the application. The specification also clearly states how this should work in 4.8: "If the component and any of its contained components (for a container component) were assigned the active HA state for some component service instances when the CLEANUP command was executed, and semantics of the redundancy model of its enclosing service group guarantee that at a point in time only one component can be in the active HA state for a given component service instance, the failure to terminate that component prevents the Availability Management Framework from assigning to another component the active HA state for these component service instances (and by the same token prevents the assignment of the active HA state to other service units for the service instances that contain the involved CSIs). In this case, the ser- vice instances will stay unassigned until an administrative action is performed to ter- minate the failed component." Can be tested by running the AMF 2N sa-aware sample app and modifying the cleanup script to do "exit 1" which gives this effect when the active component is killed: Aug 9 08:40:01 Vostro osafamfnd[11307]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'componentRestart' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Cleanup of 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' failed Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Reason:'Exec of script success, but script exits with non-zero status' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Exit code: 1 Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Component Failover trigerred for 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1': Failed component: 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Presence State INSTANTIATED => TERMINATION_FAILED Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Assigning 'safSi=AmfDemo,safApp=AmfDemo1' QUIESCED to 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Assigned 'safSi=AmfDemo,safApp=AmfDemo1' QUIESCED to 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Assigning 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro amf_demo[11620]: CSI Set - HAState Active for all assigned CSIs Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Assigned 'safSi=AmfDemo,safApp=AmfDemo1' ACTIVE to 'safSu=SU2,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Removing 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' Aug 9 08:40:01 Vostro osafamfnd[11307]: NO Removed 'safSi=AmfDemo,safApp=AmfDemo1' from 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #531 osaf: Some files have MS-DOS line endings
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#531] osaf: Some files have MS-DOS line endings** **Status:** accepted **Milestone:** 4.6.2 **Created:** Mon Aug 05, 2013 11:03 AM UTC by Anders Widell **Last Updated:** Wed Jul 15, 2015 01:55 PM UTC **Owner:** Anders Widell I did a quick check, and the following files appear to have MS-DOS line endings: osaf/services/saf/smfsv/schema/SAI-AIS-SMF-ETF-A.01.02_OpenSAF.xsd osaf/services/saf/smfsv/schema/SAI-AIS-SMF-UCS-A.01.02_OpenSAF.xsd samples/amf/non_sa_aware/net-snmp.xml samples/amf/sa_aware/AppConfig-2N.xml samples/amf/sa_aware/AppConfig-nwayactive.xml samples/amf/wrapper/net-snmp.xml samples/smfsv/campaigns/campaign_rolling_comp_agent.xml samples/smfsv/campaigns/campaign_rolling_comp.xml samples/smfsv/campaigns/campaign_rolling_nodes_os_installremove.xml samples/smfsv/campaigns/campaign_rolling_nodes.xml samples/smfsv/campaigns/campaign_rolling_su.xml tests/avsv/suites/AppConfig.xml tests/common/inc/tet_startup.h tests/cpsv/inc/tet_cpsv_conf.h tests/cpsv/inc/tet_cpsv.h tests/cpsv/src/tet_cpa.c tests/cpsv/src/tet_cpsv_util.c tests/cpsv/suites/reg_cpsv.cfg tests/edsv/inc/tet_eda.h tests/edsv/src/tet_edsv_func.c tests/edsv/src/tet_edsv_util.c tests/edsv/suites/reg_edsv.cfg tests/glsv/inc/tet_gla_conf.h tests/glsv/inc/tet_glsv.h tests/glsv/src/tet_gla.c tests/glsv/src/tet_gla_conf.c tests/glsv/src/tet_gld.c tests/glsv/src/tet_glsv_util.c tests/glsv/suites/reg_glsv.cfg tests/mbcsv/inc/mbcsv_purpose.h tests/mbcsv/src/mbcsv_cb_purpose.c tests/mbcsv/src/mbcsv_ckpt_purpose.c tests/mbcsv/src/mbcsv_inv.c tests/mbcsv/src/mbcsv_purpose.c tests/mbcsv/src/mbcsv_tmr_purpose.c tests/mbcsv/src/tet_mbcsv_util.c tests/mbcsv/suites/reg_mbcsv.cfg tests/mds/inc/tet_mdstipc.h tests/mds/suites/reg_mds.cfg tests/mqsv/inc/tet_mqa_conf.h tests/mqsv/inc/tet_mqsv.h tests/mqsv/src/tet_mqa.c tests/mqsv/src/tet_mqa_conf.c tests/mqsv/src/tet_mqd.c tests/mqsv/src/tet_mqnd.c tests/mqsv/src/tet_mqsv_util.c tests/mqsv/suites/reg_mqsv.cfg tests/OpenSAF_TET_Changs.txt --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #638 node cannot join AMF cluster after restart
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#638] node cannot join AMF cluster after restart** **Status:** accepted **Milestone:** 4.6.2 **Created:** Fri Nov 22, 2013 02:54 PM UTC by Hans Feldt **Last Updated:** Wed Jul 15, 2015 01:47 PM UTC **Owner:** A V Mahesh (AVM) OpenSAF 4.2.2 changeset 3796, 79 extra patches System: RHEL based, 2 node cluster, MDS/TIPC After node reboot of the standby controller it cannot join the cluster again. This can be seen in the syslog on the active controller: Nov 17 17:15:20 notice atrcxb3166 osafamfd[6038]: Cold sync complete! Nov 19 17:40:07 notice atrcxb3166 osafamfd[6712]: Node 'SC-2' joined the cluster Nov 19 17:42:08 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 19 17:42:28 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:24:21 notice atrcxb3166 osafamfd[6712]: Node 'SC-2' left the cluster Nov 21 16:29:04 notice atrcxb3166 osafamfd[6712]: Node 'SC-2' joined the cluster Nov 21 16:29:24 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:29:44 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:30:04 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:30:24 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:30:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:31:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:31:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:31:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:32:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:32:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:32:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:33:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:33:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:33:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:34:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:34:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:34:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:35:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:35:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:35:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:36:14 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:36:34 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 16:36:54 warning atrcxb3166 osafamfd[6712]: invalid node state 1 for node 2020f Nov 21 17:41:58 err atrcxb3166 osafamfd[6712]: avd_d2n_msg_dequeue: ncsmds_api failed 2 Nov 21 17:42:08 notice atrcxb3166 osafamfd[6712]: Node 'SC-2' left the cluster Nov 21 17:42:18 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:42:38 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:42:58 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:43:18 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:43:39 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:43:59 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:44:19 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:44:39 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:44:59 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:45:19 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:45:39 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:45:59 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:46:19 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid node ID (2020f) Nov 21 17:46:39 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid msg id 210, from 2020f should be 1 Nov 21 17:46:59 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid msg id 211, from 2020f should be 1 Nov 21 18:00:40 warning atrcxb3166 osafamfd[6712]: avd_msg_sanity_chk: invalid msg id 252, from 2020f should be 1 Nov 21 18:01:00 notice atrcxb3166 osafamfd[6712]: Node 'SC-2' left the cluster Nov 22 11:44:37 notice atrcxb3166 osafamfd[6712]: Re-initializing with IMM Nov 22 11:44:39 notice atrcxb3166 osafamfd[6712]:
[tickets] [opensaf:tickets] #178 escalation policy is not happening till the restart count exceeds, instead of reaching saAmfSGCompRestartMax for NPI components
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#178] escalation policy is not happening till the restart count exceeds, instead of reaching saAmfSGCompRestartMax for NPI components** **Status:** assigned **Milestone:** 4.6.2 **Created:** Tue May 14, 2013 06:24 AM UTC by Nagendra Kumar **Last Updated:** Wed Aug 12, 2015 09:11 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2144 error escalation is not happening till the restart count exceeds saAmfSGCompRestartMax for the components brought up in NPI. But according to spec, first level escalation should happen when the restart count reaches the saAmfSGCompRestartMax Mentioned in the spec, 3.11.2.2 page NO: 203, If this count reaches the saAmfSGCompRestartMax value before the end of the "component restart" probation period, the Availability Management Framework per- forms the first level of recovery escalation for that service unit: the Availability Man- agement Framework restarts the entire service unit --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1308 ccb object create fails with invalid param with old SaNameT Apis
- **Milestone**: 4.6.0 --> never --- ** [tickets:#1308] ccb object create fails with invalid param with old SaNameT Apis** **Status:** duplicate **Milestone:** never **Created:** Wed Apr 08, 2015 07:29 AM UTC by Sirisha Alla **Last Updated:** Thu Apr 09, 2015 01:45 AM UTC **Owner:** nobody **Attachments:** - [extralength.tar](https://sourceforge.net/p/opensaf/tickets/1308/attachment/extralength.tar) (532.5 kB; application/x-tar) This issue is seen on changeset 6377 along with patch for #1267(969 backport changes). The setup is single pbe enabled with 50k objects. The IMM Application tree is being created in the following manner. obj1 is the parent of obj2 and obj3. Obj2 is the parent of obj4 and obj4 is the parent of obj5. All the Apis used are old APIs using SaNameT. Creation for obj1, obj2 and obj3 are successfully added into the CCB. When object creations for obj4 and obj5 are added to the CCB, CCB Create failed with INVALID_PARAM. syslog on SC-1: Apr 8 12:38:21 SLES-64BIT-SLOT1 osafimmpbed: IN Create of class noDanglingPreconfigurationClass committing with ccbId:10007 Apr 8 12:38:21 SLES-64BIT-SLOT1 osafimmnd[7221]: NO Create of class noDanglingPreconfigurationClass is PERSISTENT. Apr 8 12:38:21 SLES-64BIT-SLOT1 osafimmnd[7221]: NO ERR_INVALID_PARAM: Not a proper parent name:configRdnObj2,configRdnObj1^? size:28 Apr 8 12:38:21 SLES-64BIT-SLOT1 osafimmnd[7221]: NO ERR_INVALID_PARAM: Not a proper parent name:configRdnObj4,configRdnObj2,configRdnObj1Â size:42 Apr 8 12:38:21 SLES-64BIT-SLOT1 osafimmnd[7221]: NO Ccb 2 COMMITTED (noDanglingPreconfigurationClass) The length passed in the SaNameT is 27 and 41 respectively for obj4 and obj5. But the length is being considered as 28 and 42 internally. syslog and immnd traces are attached. This is an old test which worked fine before changes for #643. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1350 configuration error should not lead to reboot of the node
- **Milestone**: 4.4.2 --> never --- ** [tickets:#1350] configuration error should not lead to reboot of the node** **Status:** invalid **Milestone:** never **Created:** Tue Apr 28, 2015 05:46 AM UTC by Sirisha Alla **Last Updated:** Tue Apr 28, 2015 07:09 AM UTC **Owner:** nobody On one of the nodes missed to configure 2PBE. Result is that the standby controller goes for reboot continuously. Reboot of the node does not recover the node from such errors. Apr 28 16:39:52 SLES-SLOT2 osafimmd[967]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 4 new epoch:5 Apr 28 16:39:52 SLES-SLOT2 osafimmd[967]: ER Active IMMD has 2PBE enabled, yet this standby is not enabled for 2PBE - exiting Apr 28 16:39:52 SLES-SLOT2 osafimmnd[831]: NO Epoch set to 5 in ImmModel Apr 28 16:39:52 SLES-SLOT2 osafamfnd[901]: NO 'safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Apr 28 16:39:52 SLES-SLOT2 osafamfnd[901]: ER safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Apr 28 16:39:52 SLES-SLOT2 osafamfnd[901]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Apr 28 16:39:52 SLES-SLOT2 opensaf_reboot: Rebooting local node; timeout=60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1109 standby failed to come up during failover
- **Milestone**: 4.3.3 --> never --- ** [tickets:#1109] standby failed to come up during failover** **Status:** duplicate **Milestone:** never **Created:** Thu Sep 18, 2014 07:33 AM UTC by Sirisha Alla **Last Updated:** Thu Sep 18, 2014 11:24 AM UTC **Owner:** nobody **Attachments:** - [logs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1109/attachment/logs.tar.bz2) (221.7 kB; application/x-bzip) The issue is seen on SLES X86 VMs running with single pbe and opensaf changeset 5697+#946 and #1067 patches. During failover it is observed that standby failed to come up. Syslog of SC-1: Sep 18 12:28:36 SLES-64BIT-SLOT1 osafclmd[2436]: Started Sep 18 12:28:37 SLES-64BIT-SLOT1 osafimmnd[2399]: NO PBE-OI established on other SC. Dumping incrementally to file imm.db Sep 18 12:28:39 SLES-64BIT-SLOT1 kernel: [ 26.576106] eth0: no IPv6 routers present Sep 18 12:28:46 SLES-64BIT-SLOT1 osafclmd[2436]: ER saNtfInitialize Failed (5) Sep 18 12:28:46 SLES-64BIT-SLOT1 osafclmd[2436]: ER clms_ntf_init FAILED Sep 18 12:28:46 SLES-64BIT-SLOT1 opensafd[2338]: ER Failed DESC:CLMD Sep 18 12:28:46 SLES-64BIT-SLOT1 opensafd[2338]: ER Going for recovery Sep 18 12:28:46 SLES-64BIT-SLOT1 opensafd[2338]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-clmd attempt #1 Sep 18 12:28:46 SLES-64BIT-SLOT1 opensafd[2338]: ER Sending SIGKILL to CLMD, pid=2428 Sep 18 12:28:46 SLES-64BIT-SLOT1 osafclmd[2436]: ER clms_init failed Sep 18 12:28:46 SLES-64BIT-SLOT1 osafclmd[2436]: ER Failed, exiting... Sep 18 12:29:01 SLES-64BIT-SLOT1 osafclmd[2457]: Started Sep 18 12:29:11 SLES-64BIT-SLOT1 osafclmd[2457]: ER saNtfInitialize Failed (5) Sep 18 12:29:11 SLES-64BIT-SLOT1 osafclmd[2457]: ER clms_ntf_init FAILED Sep 18 12:29:11 SLES-64BIT-SLOT1 opensafd[2338]: ER Could Not RESPAWN CLMD Sep 18 12:29:11 SLES-64BIT-SLOT1 opensafd[2338]: ER Failed DESC:CLMD Sep 18 12:29:11 SLES-64BIT-SLOT1 opensafd[2338]: ER Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-clmd attempt #2 Sep 18 12:29:11 SLES-64BIT-SLOT1 opensafd[2338]: ER Sending SIGKILL to CLMD, pid=2452 Sep 18 12:29:11 SLES-64BIT-SLOT1 osafclmd[2457]: ER clms_init failed Sep 18 12:29:11 SLES-64BIT-SLOT1 osafclmd[2457]: ER Failed, exiting... Sep 18 12:29:26 SLES-64BIT-SLOT1 osafclmd[2482]: Started Sep 18 12:29:36 SLES-64BIT-SLOT1 osafclmd[2482]: ER saNtfInitialize Failed (5) Sep 18 12:29:36 SLES-64BIT-SLOT1 osafclmd[2482]: ER clms_ntf_init FAILED Sep 18 12:29:36 SLES-64BIT-SLOT1 opensafd[2338]: ER Could Not RESPAWN CLMD Sep 18 12:29:36 SLES-64BIT-SLOT1 opensafd[2338]: ER Failed DESC:CLMD Sep 18 12:29:36 SLES-64BIT-SLOT1 opensafd[2338]: ER FAILED TO RESPAWN Sep 18 12:29:36 SLES-64BIT-SLOT1 osafclmd[2482]: ER clms_init failed Sep 18 12:29:36 SLES-64BIT-SLOT1 osafclmd[2482]: ER Failed, exiting... Sep 18 12:29:37 SLES-64BIT-SLOT1 osaffmd[2379]: exiting for shutdown Sep 18 12:29:37 SLES-64BIT-SLOT1 osafimmd[2389]: exiting for shutdown Sep 18 12:29:37 SLES-64BIT-SLOT1 osafimmnd[2399]: exiting for shutdown Sep 18 12:29:37 SLES-64BIT-SLOT1 osafntfimcnd[2429]: ER saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) syslog, mds log and clmd traces are attached. NTFD traces are not available, will try to get the traces if the issue gets reproducible. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1122 attribute authorizedGroup of access control feature is modifiable by any user
- **Milestone**: 4.3.3 --> never --- ** [tickets:#1122] attribute authorizedGroup of access control feature is modifiable by any user** **Status:** duplicate **Milestone:** never **Created:** Mon Sep 22, 2014 12:11 PM UTC by surender khetavath **Last Updated:** Mon Sep 22, 2014 02:29 PM UTC **Owner:** nobody changeset : 5679 According to README.ACCESS_CONTROL: """authorizedGroup" is an optional attribute of type string holding the name of an existing linux group. Members of this group will have access to IMM. Only the root user can change these attributes. """ But any user, other than root user, is able to modify this attribute. Trace shown below: immcfg -a authorizedGroup="GROUP" opensafImm=opensafImm,safApp=safImmService tet@SC-1:/etc/opensaf> immlist opensafImm=opensafImm,safApp=safImmService Name Type Value(s) authorizedGroupSA_STRING_T GROUP accessControlMode SA_UINT32_T 0 (0x0) SaImmAttrImplementerName SA_STRING_T OpenSafImmPBE SaImmAttrClassName SA_STRING_T OpensafImm SaImmAttrAdminOwnerNameSA_STRING_T --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1568 CLMD segfaulted for pending lock op during middleware si-swap
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1568] CLMD segfaulted for pending lock op during middleware si-swap** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sat Oct 24, 2015 11:17 AM UTC by Srikanth R **Last Updated:** Sat Oct 24, 2015 11:17 AM UTC **Owner:** nobody **Attachments:** - [SC-1.tgz](https://sourceforge.net/p/opensaf/tickets/1568/attachment/SC-1.tgz) (27.3 kB; application/x-compressed-tar) Changeset : 6901 Steps : 1) Invoked lock operation on one of the payload PL-5. 2) CLM Agent on PL-3 did not respond to the lock operation. 3) With this pending operation, invoked controller switchover. 4) CLMD on active controller seg faulted during quiesced processing. Oct 24 15:53:13 SYSTEST-CNTLR-1 osafamfd[5863]: NO Pending Response sent for CLM track callback::OK '1' Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfd[5863]: NO safSi=SC-2N,safApp=OpenSAF Swap initiated Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfnd[5873]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Oct 24 15:53:15 SYSTEST-CNTLR-1 osafimmnd[5809]: NO Implementer locally disconnected. Marking it as doomed 173 <457, 2010f> (safSmfService) Oct 24 15:53:15 SYSTEST-CNTLR-1 osafamfnd[5873]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' signal: 11 pid: 0 uid: 0 /usr/lib64/libopensaf_core.so.0(+0x1a27b)[0x7ff095ffb27b] /lib64/libpthread.so.0(+0xf7c0)[0x7ff0951207c0] /lib64/libc.so.6(cfree+0x39)[0x7ff094a0a2c9] /lib64/librt.so.1(timer_delete+0x42)[0x7ff094d08b52] /usr/lib64/opensaf/osafclmd[0x405298] /usr/lib64/libSaAmf.so.0(+0x9213)[0x7ff095dd0213] /usr/lib64/libSaAmf.so.0(+0xa307)[0x7ff095dd1307] /usr/lib64/libSaAmf.so.0(saAmfDispatch+0x1d4)[0x7ff095dcaf94] /usr/lib64/opensaf/osafclmd[0x4047df] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7ff0949aec36] /usr/lib64/opensaf/osafclmd[0x404ea5] CLMD trace is attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1573 pyosaf: Add missing IMM flags
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1573] pyosaf: Add missing IMM flags** **Status:** review **Milestone:** 4.6.2 **Created:** Wed Oct 28, 2015 08:12 AM UTC by Hung Nguyen **Last Updated:** Wed Oct 28, 2015 08:43 AM UTC **Owner:** Hung Nguyen Missing flags #define SA_IMM_ATTR_NO_DUPLICATES 0x0100 /* OpenSaf 4.3 */ #define SA_IMM_ATTR_NOTIFY0x0200 /* OpenSaf 4.3 */ #define SA_IMM_ATTR_NO_DANGLING 0x0400 /* OpenSaf 4.4 */ #define SA_IMM_ATTR_DN0x0800 /* OpenSaf 4.6 */ #define SA_IMM_ATTR_DEFAULT_REMOVED 0x1000 /* OpenSaf 4.7 */ #define SA_IMM_SEARCH_GET_CONFIG_ATTR0x0001 /* OpenSaf 4.3 */ #define SA_IMM_SEARCH_NO_DANGLING_DEPENDENTS 0x0001 /* OpenSaf 4.4 */ --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1510 CKPT: cpnd crashes during checkpoint open timeout with large sections
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1510] CKPT: cpnd crashes during checkpoint open timeout with large sections** **Status:** review **Milestone:** 4.6.2 **Created:** Thu Oct 01, 2015 04:14 PM UTC by Alex Jones **Last Updated:** Thu Oct 01, 2015 07:54 PM UTC **Owner:** Alex Jones When opening a collocated checkpoint replica where the active has large numbers of sections (~200k), the sync from the active can timeout with errorcode SA_AIS_ERR_TRY_AGAIN. In this case the code deletes the memory for the node, but does not delete the node from the db. When the checkpoint access is tried again, the freed memory for the node is still in the db, and ckptnd crashes. Valgrind analysis shows the following: ==53610== Thread 1: ==53610== Invalid read of size 4 ==53610==at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de60 is 0 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 8 ==53610==at 0x4E4D7C0: ncs_patricia_tree_get (patricia.c:90) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de70 is 16 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 8 ==53610==at 0x4E4D7FB: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de78 is 24 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 1 ==53610==at 0x4C2D0B9: bcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de80 is 32 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 1 ==53610==at 0x4C2D0D0: bcmp (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== Address 0x687de81 is 33 bytes inside a block of size 1,072 free'd ==53610==at 0x4C29D4E: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==53610==by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983) ==53610==by 0x40D426: cpnd_process_evt (cpnd_evt.c:202) ==53610==by 0x40E9D6: cpnd_main_process (cpnd_init.c:568) ==53610==by 0x403882: main (cpnd_main.c:72) ==53610== ==53610== Invalid read of size 4 ==53610==at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93) ==53610==by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42) ==53610==by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602) ==53610==by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335) ==53610==by 0x40E9D6:
[tickets] [opensaf:tickets] #1503 IMM: Augumented CCb client went down the OM client should get err
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1503] IMM: Augumented CCb client went down the OM client should get err** **Status:** assigned **Milestone:** 4.6.2 **Created:** Fri Sep 25, 2015 09:18 AM UTC by Neelakanta Reddy **Last Updated:** Mon Oct 05, 2015 10:15 AM UTC **Owner:** Neelakanta Reddy OM on node1 and OI on node2. OM creates an object. In OI augument by creating an object and the OI client goes down. The CCb get aborted in IMM database.But the OM create API will not get return value and after SYNC_TIMEOUT OM API receives TIME_OUT. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1515 AMF : SU struck in terminating for failure during csi assignment in si-swap (Nway)
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1515] AMF : SU struck in terminating for failure during csi assignment in si-swap (Nway)** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Oct 05, 2015 12:45 PM UTC by Srikanth R **Last Updated:** Tue Oct 06, 2015 06:38 AM UTC **Owner:** nobody Changeset : 6901 amf application : 3 SUs with 5 SIs. ( Su1 and SU3 hosted on PL-3 and SU2 hosted on PL-4). Nway redundancy model. Issue : SU struck in terminating for failure during csi active assignment in si-swap (Nway) Steps : -> Initially brought up the application by unlocking the SG and below are the assignments . TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SI5 TestApp_SU1ACTIVE ACTIVE ACTIVE STANDBY TestApp_SU2STANDBY STANDBY STANDBY ACTIVE TestApp_SU3ACTIVE STANDBY -> Before performing si-swap operation on SU1, ensured that component with SI1 standby assignment shall reject the active callback -> Invoked the si-swap operation. As the component responded with ERR_FAILED_OP in active callback, recovery action is triggered for SU. Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_Nway' ACTIVE to 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nwa Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO 'safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nway' faulted due to 'csiSetcallbackFailed' : Recovery is 'componentFailover' Oct 5 15:15:02 PAYLOAD-2 osafamfnd[2659]: NO 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_Nway' Presence State INSTANTIATED => TERMINATING But the SU struck in terminating state and below are the final assignments. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SI5 TestApp_SU1QUIESCED ACTIVE ACTIVE STANDBY STANDBY TestApp_SU2ACTIVE STANDBY STANDBY QUIESCED TestApp_SU3STANDBY ACTIVE ACTIVE --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1512 AMF : SU struck in Quiesced state after Lock operation of SU in Nway
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1512] AMF : SU struck in Quiesced state after Lock operation of SU in Nway** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Oct 05, 2015 08:56 AM UTC by Srikanth R **Last Updated:** Mon Oct 05, 2015 09:26 AM UTC **Owner:** nobody **Attachments:** - [QuiescedNway.sh](https://sourceforge.net/p/opensaf/tickets/1512/attachment/QuiescedNway.sh) (11.3 kB; application/x-shellscript) Changeset : 6901 Amf application : 3 SUs hosted on PL-3 and PL-4 4 SIs ( Redundancy model : Nway ) Issue : SU struck in Quiesced state, after lock operation issued on one of the SU. Steps : -> Initially brought up AMF application configured in Nway redundancy model with 3Sus and 4 SIs. Below are the configuration attributes for SG. saAmfSGNumPrefStandbySUs SA_UINT32_T 1 (0x1) saAmfSGNumPrefInserviceSUs SA_UINT32_T 4 (0x4) saAmfSGNumPrefAssignedSUs SA_UINT32_T 4 (0x4) saAmfSGNumPrefActiveSUsSA_UINT32_T 3 (0x3) saAmfSGNumCurrNonInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrAssignedSUs SA_UINT32_T 3 (0x3) saAmfSGMaxStandbySIsperSU SA_UINT32_T 1 (0x1) saAmfSGMaxActiveSIsperSU SA_UINT32_T 3 (0x3) -> Brought up the application by unlocking the SG and below are the assignments. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1ACTIVE ACTIVE ACTIVE STANDBY TestApp_SU2STANDBY ACTIVE TestApp_SU3 STANDBY -> Now performed lock operation on the SU1. SU1 struck in quiesced state after the operation. TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1QUIESCED QUIESCED STANDBY TestApp_SU2ACTIVE STANDBY ACTIVE ACTIVE TestApp_SU3STANDBY ACTIVE ** -> When the opensafd on the payload PL-3 is stopped, amfd on active controller crashed. Oct 5 13:04:13 CONTROLLER-1 osafamfd[8492]: su.cc:1885: dec_curr_stdby_si: Assertion 'saAmfSUNumCurrStandbySIs > 0' failed. Oct 5 13:04:13 CONTROLLER-1 osafamfnd[8502]: ER AMF director unexpectedly crashed The script to bring up the application is attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1329 ntf Mismatch when ntfread notificationClassId and ntfsend and notificationClassId
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1329] ntf Mismatch when ntfread notificationClassId and ntfsend and notificationClassId** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 22, 2015 06:29 AM UTC by Per Rodenvall **Last Updated:** Wed Jul 15, 2015 01:04 PM UTC **Owner:** nobody ntf Mismatch when ntfread notificationClassId and ntfsend and notificationClassId When reading notificationClassId with ntfread the format is dot separated in printout. If notificationClassId will be used in ntfsend we have to replace the dots with comma. ntfread command should follow the syntax specified in “ntfread –help” e.g. with comma between vendorid, majorid, minored. OPTIONS -c or --notificationClassId=VE,MA,MI vendorid, majorid, minorid --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1348 IMM: Document that OI_CALLBACK_TIMEOUT is not applicable to admin-operations.
- **Milestone**: 4.5.2 --> never --- ** [tickets:#1348] IMM: Document that OI_CALLBACK_TIMEOUT is not applicable to admin-operations.** **Status:** duplicate **Milestone:** never **Created:** Mon Apr 27, 2015 01:01 PM UTC by Sirisha Alla **Last Updated:** Fri Jun 05, 2015 10:59 AM UTC **Owner:** Anders Bjornerstedt **Attachments:** - [logs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1348/attachment/logs.tar.bz2) (96.4 kB; application/x-bzip) When AdminOperation_2() API is invoked with 30 seconds as timeout and OI_CALLBACK_TMOUT is configured as 8 seconds, the API returned TIMEOUT only after 30 seconds. IMMA_OI_CALLBACK_TMOUT is applicable for all the OI callbacks including OI Admin Operation Callback. Following is the IMMA Trace: Apr 27 18:24:17.295219 imma [3392:imma_oi_api.c:0164] T2 OI client version A.2.15 Apr 27 18:24:17.295226 imma [3392:imma_oi_api.c:0196] T2 IMMA library OI timeout set to:8 Apr 27 18:24:17.295347 imma [3392:imma_oi_api.c:0290] T1 Trying to add OI client id:51 node:2030f handle:330002030f Apr 27 18:24:17.295358 imma [3392:imma_oi_api.c:0383] << initialize_common Apr 27 18:24:17.303478 imma [3392:imma_om_api.c:3661] >> admin_op_invoke_common Apr 27 18:24:17.303492 imma [3392:imma_om_api.c:3801] TR immInvocations:0 Apr 27 18:24:17.303499 imma [3392:imma_om_api.c:3815] TR PARAM:testOiTmout_verifyAdminOpCallback_101 Apr 27 18:24:17.305642 imma [3392:imma_proc.c:1346] TR ** Event type:6 Apr 27 18:24:17.305674 imma [3392:imma_proc.c:1239] >> imma_proc_free_pointers Apr 27 18:24:17.305687 imma [3392:imma_proc.c:1332] << imma_proc_free_pointers Apr 27 18:24:17.305754 imma [3392:imma_db.c:0187] >> imma_oi_ccb_record_find Apr 27 18:24:17.305762 imma [3392:imma_db.c:0198] << imma_oi_ccb_record_find Apr 27 18:24:17.305766 imma [3392:imma_proc.c:1914] >> imma_process_callback_info Apr 27 18:24:47.334148 imma [3392:imma_om_api.c:3864] TR Fevs send RETURNED:5 Apr 27 18:24:47.334241 imma [3392:imma_om_api.c:4009] << admin_op_invoke_common IMMA and IMMND traces are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1335 AMF : health check is started even if safHealthCheckKey attribute is not set as osafHealthCheck
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1335] AMF : health check is started even if safHealthCheckKey attribute is not set as osafHealthCheck** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 22, 2015 05:31 PM UTC by Srikanth R **Last Updated:** Mon Aug 10, 2015 11:42 AM UTC **Owner:** nobody Changeset : 6377 1)For an NPI component, if AMF needs to perform health check, following two commands need to be ran as part of the configuration bringup. immcfg -c SaAmfCompType $SaAmfCompType_npi -a saAmfCtCompCategory=8 -a saAmfCtDefClcCliTimeout=100 -a saAmfCtDefCallbackTimeout=100 -a saAmfCtRelPathInstantiateCmd="amf_comp_script instantiate_npi" -a saAmfCtRelPathCleanupCmd="amf_comp_script cleanup" -a saAmfCtDefRecoveryOnError=2 -a saAmfCtDefDisableRestart=0 -a saAmfCtSwBundle=safSmfBundle=$SaSmfSwBundle -a osafAmfCtRelPathHcCmd="health_check_script" -a osafAmfCtDefHcCmdArgv="state" -a saAmfCtRelPathTerminateCmd="amf_comp_script cleanup" immcfg -c SaAmfHealthcheckType safHealthcheckKey=osafHealthCheck,$SaAmfCompType_npi -a saAmfHctDefPeriod=100 -a saAmfHctDefMaxDuration=60 2) If the user does not run the second command before instantiating the component, health check is not started as of now, which is fine. 3) But if the user run the following command by deleting the health check key once the application configuration is done ( SU in lock-in state) , health check is still started when SU is unlocked. immcfg -d safHealthcheckKey=osafHealthCheck,safVersion=4.0.0,safCompType=TWONCOMPBASETYPE_NPI *Deviation* Health check should not be started, as the key is deleted before performing the unlock-in and unlock operations of SU --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #722 payloads did not go for reboot when both the controllers rebooted
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#722] payloads did not go for reboot when both the controllers rebooted** **Status:** assigned **Milestone:** 4.6.2 **Created:** Thu Jan 16, 2014 07:36 AM UTC by Sirisha Alla **Last Updated:** Tue Aug 11, 2015 06:32 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [payloadnoreboot.tar.bz2](https://sourceforge.net/p/opensaf/tickets/722/attachment/payloadnoreboot.tar.bz2) (765.1 kB; application/x-bzip) The issue is seen on changeset 4733 + patches of CLM corresponding to changesets of #220. Continuous failovers are happening when some api invocations of IMM application are ongoing. The IMMD has asserted on the new active which is reported in the ticket #721 When both controllers got rebooted, the payloads did not get rebooted. Instead the opensaf services are up and running. CLM shows that both the payloads are not part of cluster. When the payloads are restarted manually, they joined the cluster. PL-3 syslog: Jan 15 18:23:09 SLES-64BIT-SLOT3 osafimmnd[3550]: NO implementer for class 'testMA_verifyObjApplNoResponseModCallback_101' is released => class extent is UNSAFE Jan 15 18:23:59 SLES-64BIT-SLOT3 logger: Invoking failover from invoke_failover.sh Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS message:92993 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for message type 57 - ignoring Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA DISCARD DUPLICATE FEVS message:92994 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Error code 2 returned for message type 57 - ignoring Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: WA Director Service in NOACTIVE state - fevs replies pending:1 fevs highest processed:92994 Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[3550]: NO No IMMD service => cluster restart Jan 15 18:24:01 SLES-64BIT-SLOT3 osafamfnd[3572]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: Started Jan 15 18:24:01 SLES-64BIT-SLOT3 osafimmnd[6827]: NO Persistent Back-End capability configured, Pbe file:imm.db (suffix may get added) Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176901] TIPC: Resetting link <1.1.3:eth0-1.1.2:eth0>, peer not responding Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176911] TIPC: Lost link <1.1.3:eth0-1.1.2:eth0> on network plane A Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.176918] TIPC: Lost contact with <1.1.2> Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256091] TIPC: Resetting link <1.1.3:eth0-1.1.1:eth0>, peer not responding Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256100] TIPC: Lost link <1.1.3:eth0-1.1.1:eth0> on network plane A Jan 15 18:24:07 SLES-64BIT-SLOT3 kernel: [ 6343.256106] TIPC: Lost contact with <1.1.1> Jan 15 18:24:25 SLES-64BIT-SLOT3 kernel: [ 6361.425537] TIPC: Established link <1.1.3:eth0-1.1.2:eth0> on network plane A Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jan 15 18:24:27 SLES-64BIT-SLOT3 osafimmnd[6827]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_LOADING_CLIENT Jan 15 18:24:29 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin owner 1 does not exist Jan 15 18:24:36 SLES-64BIT-SLOT3 kernel: [ 6372.473240] TIPC: Established link <1.1.3:eth0-1.1.1:eth0> on network plane A Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO ERR_BAD_HANDLE: Admin owner 2 does not exist Jan 15 18:24:39 SLES-64BIT-SLOT3 osafimmnd[6827]: NO NODE STATE-> IMM_NODE_LOADING Jan 15 18:24:45 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:5000 Jan 15 18:24:46 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:6000 Jan 15 18:24:47 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:7000 Jan 15 18:24:48 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:8000 Jan 15 18:24:49 SLES-64BIT-SLOT3 osafimmnd[6827]: WA Number of objects in IMM is:9000 After both the controllers came up following is the status: SLES-64BIT-SLOT1:~ # immlist safNode=PL-3,safCluster=myClmCluster Name Type Value(s) safNodeSA_STRING_T safNode=PL-3 saClmNodeLockCallbackTimeout SA_TIME_T500 (0xba43b7400, Thu Jan 1 05:30:50 1970) saClmNodeIsMember SA_UINT32_T saClmNodeInitialViewNumber SA_UINT64_T saClmNodeIDSA_UINT32_T saClmNodeEESA_NAME_T saClmNodeDisableReboot
[tickets] [opensaf:tickets] #865 LOG: standby controller went for reboot after s/w followed by immnd kill
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#865] LOG: standby controller went for reboot after s/w followed by immnd kill** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Apr 21, 2014 10:51 AM UTC by surender khetavath **Last Updated:** Mon Aug 03, 2015 11:30 AM UTC **Owner:** nobody **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/865/attachment/logs.tgz) (14.1 MB; application/x-compressed-tar) Changeset : 5143 case: 1) SC-1 is active and 2) SC-2 is standby 2) invoke switchover from SC-1 as 'amf-adm si-swap safSi=SC-2N,safApp=OpenSAF' 3) kill immnd on SC-2 no sooner SC-1 receives quiesced cbk Si-swap operation will time-out. console output: amf-adm si-swap safSi=SC-2N,safApp=OpenSAF error - command timed out (alarm) wait for some time say 1-2mins, sc-2 will reboot with message in syslog as shown below Apr 21 16:10:01 SC-2 osafamfnd[15380]: NO 'safComp=LOG,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackTimeout' : Recovery is 'nodeFailfast' Apr 21 16:10:01 SC-2 osafamfnd[15380]: ER safComp=LOG,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackTimeout Recovery is:nodeFailfast Apr 21 16:10:01 SC-2 osafamfnd[15380]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Apr 21 16:10:01 SC-2 opensaf_reboot: Rebooting local node; timeout=60 Apr 21 16:10:04 SC-2 kernel: [13082.449771] md: stopping all md devices. Apr 21 16:10:04 SC-2 kernel: [13083.455393] sd 0:0:0:0: [sda] Synchronizing SCSI cache Also, there is log in sc-1 syslog saying Apr 21 16:10:33 SC-1 osafamfd[15353]: ER Alarm lost for safSi=NoRed1,safApp=OpenSAF logs of controller attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #786 Boot time stamp changes when a clm node is unconfigured and reconfigured
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#786] Boot time stamp changes when a clm node is unconfigured and reconfigured** **Status:** assigned **Milestone:** 4.6.2 **Created:** Fri Feb 14, 2014 06:26 AM UTC by manu **Last Updated:** Wed Jul 15, 2015 01:32 PM UTC **Owner:** Mathi Naickan As per Clm Spec boot timestamp is the time at which this node last booted , It is supposed to change only when the board comes after reboot but Unconfiguring and reconfiguring of clm oblect also causes this parameter to change . --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #789 CLM: CallBacks are getting delivered for operations that have been performed before registering for track
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#789] CLM: CallBacks are getting delivered for operations that have been performed before registering for track ** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Feb 14, 2014 08:59 AM UTC by manu **Last Updated:** Wed Jul 15, 2015 01:32 PM UTC **Owner:** nobody **Attachments:** - [clm_traces.tar](https://sourceforge.net/p/opensaf/tickets/789/attachment/clm_traces.tar) (1.7 MB; application/x-tar) Sometimes when though Registering for track is done after performing the operation ,callback is getting delivered for the same which is a violation of spec. Following is one such case where this behaviour seen. 1.perform a failover 2.Register for Track callback. 3.Change the configuration of any node 4.Dispatch the Callback After Step4 I am supposed to recieve a callback for admin operation done in step 3 ,but the callback for operation done in step1 is also getting delivered --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #466 Length of the objectnames is more by one for configuration object notifications
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#466] Length of the objectnames is more by one for configuration object notifications** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Jun 20, 2013 09:08 AM UTC by Sirisha Alla **Last Updated:** Wed Jul 15, 2015 02:06 PM UTC **Owner:** nobody When ntfimcnd sends notifications for configuration object creation/modification/deletion, the length of the notifying object and the notification object is been shown wrongly. IMM callback gives the length of the notification object correctly. Notification object length in the imm callback: objectName->length: 37 objectName->value: 'attrName_testSA_registerSA_Node_37_69' Object create/modify/delete notifications indicate the length of notification object is 38 and the length of notifying object is 15 for "safApp=OpenSaf". This issue is reproducible. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #416 saAmfResponse() returns invalid parameter when value of 'error' is unrecognized
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#416] saAmfResponse() returns invalid parameter when value of 'error' is unrecognized** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri May 31, 2013 06:04 AM UTC by Nagendra Kumar **Last Updated:** Thu Aug 06, 2015 10:11 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2788 This is a spec deviation see e.g. 7.9.3 "Any other error code set in the error parameter in the response will be treated by the Availability Management Framework as if the caller had set the error parameter to SA_AIS_ERR_FAILED_OPERATION." --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #412 amf: After runtime delete of sponser CSI, SU stuck in Quiesed state after admin lock of SUs.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#412] amf: After runtime delete of sponser CSI, SU stuck in Quiesed state after admin lock of SUs.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri May 31, 2013 05:57 AM UTC by Praveen **Last Updated:** Thu Aug 06, 2015 10:15 AM UTC **Owner:** nobody Migrated to http://devel.opensaf.org/ticket/2861. changeset : 3796, 4.2.2 model : 2n Configuration:- = 2SUs.SU1 on PL-3 and SU2 on SC-1 2 SIs. 3CSIs (CSI1, CSI2, and CSI3) per SI. CSI-CSI dependency configured as CSI1 dependent on CSI2 and CSI2 dependent on CSI3 for both the SIs. Problem description:- === After successfully runtime delete of the sponser CSI3 of SI1, admin lock on SU1 and SU2 returns timeout and SU1 struck in QUIESCED state. Finally SG becomes unstable. While doing the admin lock on SU2, /var/log/messages keeps printing the below messages:- Oct 11 17:25:51 SLES-SLOT-1 osafamfd[3567]: SG state is not stable Oct 11 17:25:52 SLES-SLOT-1 osafamfd[3567]: SG state is not stable Oct 11 17:28:21 SLES-SLOT-1 osafamfd[3567]: Admin operation is already going Oct 11 17:28:22 SLES-SLOT-1 osafamfd[3567]: Admin operation is already going States after lock of SU1 and SU2:- safSu=csidep_2n_1,safSg=SG_csidep_2n,safApp=2nApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=csidep_2n_2,safSg=SG_csidep_2n,safApp=2nApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSISU=safSu=csidep_2n_1\,safSg=SG_csidep_2n\,safApp=2nApp,safSi=csidep_2n,safApp=2nApp saAmfSISUHAState=QUIESCED(3) safSISU=safSu=csidep_2n_1\,safSg=SG_csidep_2n\,safApp=2nApp,safSi=csidep_2n_1,safApp=2nApp saAmfSISUHAState=QUIESCED(3) safSISU=safSu=csidep_2n_2\,safSg=SG_csidep_2n\,safApp=2nApp,safSi=csidep_2n,safApp=2nApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=csidep_2n_2\,safSg=SG_csidep_2n\,safApp=2nApp,safSi=csidep_2n_1,safApp=2nApp saAmfSISUHAState=ACTIVE(1) Changed 7 months ago by nagendra Looks duplicate of http://devel.opensaf.org/ticket/2842 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #449 LOG: OI Completed Callback function has undefined return values
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#449] LOG: OI Completed Callback function has undefined return values** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Jun 10, 2013 01:39 PM UTC by elunlen **Last Updated:** Wed Jul 15, 2015 02:21 PM UTC **Owner:** elunlen The OI Completed Callback shall return SA_AIS_OK or SA_AIS_BAD_OPERATION as a result of the parameter check. However the LOG service may return other SA_AIS return codes. See file lgs_imm.c --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1344 CLM : clmd should not send the callbacks for tracking on non-member node
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1344] CLM : clmd should not send the callbacks for tracking on non-member node** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sun Apr 26, 2015 06:00 PM UTC by Srikanth R **Last Updated:** Sun Apr 26, 2015 06:00 PM UTC **Owner:** nobody Changeset : 6377 As per the spec 3.5.1 page #44, """ If saClmClusterTrack_4() is invoked on non-member nodes, the follow- ing applies: • if SA_TRACK_CURRENT is specified, only information about the local node is returned in the structure pointed to by notificationBuffer or in the subsequent callback; • if SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY is specified, call- backs will only be invoked when the node joins the cluster membership. """ As of now, CLM service delivers the callbacks for the agents on non-member nodes and wait for the operation completion until the agent responds. This should be changed according to the spec. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1343 CLM : clmd asserted when controller switchover is invoked with CLM shutdown operation of node
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1343] CLM : clmd asserted when controller switchover is invoked with CLM shutdown operation of node** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sun Apr 26, 2015 05:03 PM UTC by Srikanth R **Last Updated:** Mon Apr 27, 2015 04:08 AM UTC **Owner:** nobody Changeset : 6377 Steps performed : -> Issued admin shutdown operation on a member node PL-5 and ensured the CLM agent did not respond in the start callback 426 16:51:12 04/26/2015 NO safApp=safClmService "safNode=PL-5,safCluster=myClmCluster Admin State Changed, new state=SHUTTING_DOWN" -> Invoked controller switchover by issuing admin si-swap operation. 427 16:51:20 04/26/2015 NO safApp=safAmfService "Admin op "SI_SWAP" initiated for 'safSi=SC-2N,safApp=OpenSAF', invocation: 502511173633" 428 16:51:20 04/26/2015 NO safApp=safAmfService "safSi=SC-2N,safApp=OpenSAF Swap initiated" -> clmd asserted on the quiesced controller. Apr 26 16:51:20 CONTROLLER-1 osafamfd[2119]: NO safSi=SC-2N,safApp=OpenSAF Swap initiated Apr 26 16:51:20 CONTROLLER-1 osafamfnd[2129]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Apr 26 16:51:20 CONTROLLER-1 osafimmnd[2063]: NO Implementer locally disconnected. Marking it as doomed 80 <604, 2010f> (safSmfService) Apr 26 16:51:20 CONTROLLER-1 osafimmnd[2063]: NO Implementer disconnected 75 <332, 2010f> (safMsgGrpService) Apr 26 16:51:20 CONTROLLER-1 osafimmnd[2063]: NO Implementer disconnected 80 <604, 2010f> (safSmfService) Apr 26 16:51:20 CONTROLLER-1 osafimmnd[2063]: NO Implementer disconnected 72 <3, 2010f> (safLogService) Apr 26 16:51:20 CONTROLLER-1 osafimmnd[2063]: NO Implementer disconnected 78 <334, 2010f> (safEvtService) Apr 26 16:51:20 CONTROLLER-1 osafamfnd[2129]: NO 'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Apr 26 16:51:20 CONTROLLER-1 osafamfnd[2129]: ER safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Apr 26 16:51:20 CONTROLLER-1 osafamfnd[2129]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 -> Below is the backtrace : (gdb) thread apply all bt Thread 4 (Thread 0x7f4caf033700 (LWP 2100)): #0 0x7f4cadcd7415 in __lll_unlock_wake () from /lib64/libpthread.so.0 #1 0x7f4cadcd3ac4 in _L_unlock_553 () from /lib64/libpthread.so.0 #2 0x7f4cadcd39f7 in __pthread_mutex_unlock_usercnt () from /lib64/libpthread.so.0 #3 0x7f4caec06870 in ncsmds_adm_api () from /usr/lib64/libopensaf_core.so.0 #4 0x7f4caec1f813 in vda_chg_role_vdest () from /usr/lib64/libopensaf_core.so.0 #5 0x7f4caec1ed79 in ncsvda_api () from /usr/lib64/libopensaf_core.so.0 #6 0x0041e6ec in clms_mds_change_role () #7 0x00404617 in amf_quiesced_state_handler () #8 0x00404778 in clms_amf_csi_set_callback () #9 0x7f4cae9a5ba0 in ava_hdl_cbk_rec_prc () from /usr/lib64/libSaAmf.so.0 #10 0x7f4cae9a530d in ava_hdl_cbk_dispatch_all () from /usr/lib64/libSaAmf.so.0 #11 0x7f4cae9a4e34 in ava_hdl_cbk_dispatch () from /usr/lib64/libSaAmf.so.0 #12 0x7f4cae99df14 in saAmfDispatch () at ava_api.c:261 #13 0x00411032 in main () Thread 3 (Thread 0x7f4caf010b00 (LWP 2104)): #0 0x7f4cad6164f6 in poll () from /lib64/libc.so.6 #1 0x7f4caebd0df1 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0 #2 0x7f4caebd0d27 in osaf_poll () from /usr/lib64/libopensaf_core.so.0 #3 0x7f4caebd0ef0 in osaf_poll_one_fd () from /usr/lib64/libopensaf_core.so.0 #4 0x7f4cadee7a04 in rda_read_msg () from /usr/lib64/opensaf/librda.so.0 #5 0x7f4cadee71e7 in rda_callback_task () from /usr/lib64/opensaf/librda.so.0 #6 0x7f4cadcd07b6 in start_thread () from /lib64/libpthread.so.0 #7 0x7f4cad61f9cd in clone () from /lib64/libc.so.6 #8 0x in ?? () Thread 2 (Thread 0x7f4caf062b00 (LWP 2102)): #0 0x7f4cad6164f6 in poll () from /lib64/libc.so.6 #1 0x7f4caebd0df1 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0 #2 0x7f4caebda7b5 in ncs_tmr_wait () from /usr/lib64/libopensaf_core.so.0 #3 0x7f4cadcd07b6 in start_thread () from /lib64/libpthread.so.0 #4 0x7f4cad61f9cd in clone () from /lib64/libc.so.6 #5 0x in ?? () Thread 1 (Thread 0x7f4caf030b00 (LWP 2103)): #0 0x7f4cad57ab55 in raise () from /lib64/libc.so.6 #1 0x7f4cad57c131 in abort () from /lib64/libc.so.6 #2 0x7f4cad5b7c2f in __libc_message () from /lib64/libc.so.6 #3 0x7f4cad5bd358 in malloc_printerr () from /lib64/libc.so.6 #4 0x7f4cad5c099d in _int_malloc () from /lib64/libc.so.6 #5 0x7f4cad5c23e7 in malloc () from /lib64/libc.so.6 #6 0x7f4caec0362a in mds_subtn_res_tbl_remove_active () from
[tickets] [opensaf:tickets] #1349 LOG : lgs_own_log_files is not called when logDataGroupname is reset to ""
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1349] LOG : lgs_own_log_files is not called when logDataGroupname is reset to ""** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 28, 2015 04:46 AM UTC by Srikanth R **Last Updated:** Tue Apr 28, 2015 04:46 AM UTC **Owner:** nobody Changeset : 6490 If the group name is set to tet using the following command, all the existing log files are owned by the new group. immcfg -a logDataGroupname=tet logConfig=1,safApp=safLogService Apr 28 10:01:14.035761 osaflogd [5213:lgs_imm.c:1988] >> config_ccb_apply_modify: CCB ID 9, 'logConfig=1,safApp=safLogService' Apr 28 10:01:14.035766 osaflogd [5213:lgs_imm.c:1997] TR attribute logDataGroupname Apr 28 10:01:14.035770 osaflogd [5213:lgs_imm.c:1948] >> logDataGroupname_fileown Apr 28 10:01:14.035784 osaflogd [5213:lgs_imm.c:3123] NO LOG service data group is changed to tet Apr 28 10:01:14.035791 osaflogd [5213:lgs_util.c:0606] >> lgs_own_log_files: stream safLgStrCfg=appstream1,safApp=safLogService . Apr 28 10:01:14.036787 osaflogd [5213:lgs_filehdl.c:0757] T3 /var/log/opensaf/saflog/./saLogSystem_20150428_092511.log Apr 28 10:01:14.036804 osaflogd [5213:lgs_filehdl.c:0771] << own_log_files_by_group_hdl If the group name is set to default, the existing log files are not owned by calling lgs_own_log_files function. immcfg -a logDataGroupname="" logConfig=1,safApp=safLogService Apr 28 10:01:40.624851 osaflogd [5213:lgs_imm.c:1948] >> logDataGroupname_fileown Apr 28 10:01:40.624875 osaflogd [5213:lgs_imm.c:3123] NO LOG service data group is changed to Apr 28 10:01:40.624884 osaflogd [5213:lgs_imm.c:1971] << logDataGroupname_fileown --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1342 CLM : Deviations from spec in populating track callback parameters.
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1342] CLM : Deviations from spec in populating track callback parameters.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sun Apr 26, 2015 04:16 PM UTC by Srikanth R **Last Updated:** Sun Apr 26, 2015 04:16 PM UTC **Owner:** nobody Changset : 6377 Following are the various issues observed while populating track callback by CLM. 1) rootCauseEntity is not set to NULL ( but to random value), when saClmClusterTrack_4 is called with trackFlags set to SA_TRACK_CURRENT. 2) In the callback for start step for lock operation, timeSupervision parameter is not filled up with the configured attribute saClmNodeLockCallbackTimeout of the node undergoing lock / shutdown operation. Breakpoint 1, pycbk_SaClmClusterTrackCallbackT_4 (notificationBuff=0x853b08, numberOfMembers=4, invocation=137439085583, rootCauseEntity=0x8534b0, correlationIds=0x84c130, step=SA_CLM_CHANGE_START, timeSupervision=-6612564084514619392, error1=SA_AIS_OK) at saClm_wrap.c:2901 For shutdown operation, timeSupervision parameter should be filled up with zero, as the admin operation is not timebound 3) clusterChange in the notificationBuffer is not filled up with SA_CLM_NODE_UNLOCK, if unlock operation is performed on the node when shutdown operation is in progress. Initial callback when shutdown operation is issued : Breakpoint 1, pycbk_SaClmClusterTrackCallbackT_4 (notificationBuff=0x853b08, numberOfMembers=4, invocation=163208889359, rootCauseEntity=0x84d9d0, correlationIds=0x7f5df0, step=SA_CLM_CHANGE_START, timeSupervision=-6612564084514619392, error1=SA_AIS_OK) at saClm_wrap.c:2901 2901 printf("root cse entity in c-clbk %s",rootCauseEntity->value); (gdb) p (*notificationBuff)->notification[0] $49 = {clusterNode = {nodeId = 132111, nodeAddress = {family = SA_CLM_AF_INET, length = 0, value = "S\367\377\177\000\000\000\000\000\000\000\000\000\246\000\000\000\000\000\000\000<\000\000\000\001\000\000\000\003\000\002\004\017\000\000\000\001\000\000\000$safNode=PL-4,safCluste"}, nodeName = {length = 36, value = "safNode=PL-4,safCluster=myClmCluster\000\000\000\000\000\000\000\000\000$safNode=PL-4,safCluster=myClmCluster", '\000' , "f\360\240\000\000\000\000\000\000\000 \000\000\000\004", '\000' , "\001", '\000' }, executionEnvironment = {length = 0, value = '\000' }, member = SA_TRUE, bootTimestamp = 14300225290, initialViewNumber = 64}, clusterChange = SA_CLM_NODE_SHUTDOWN} Second callback, where clusterChange is improperly filled : Breakpoint 1, pycbk_SaClmClusterTrackCallbackT_4 (notificationBuff=0x859108, numberOfMembers=4, invocation=0, rootCauseEntity=0x859610, correlationIds=0x84c130, step=SA_CLM_CHANGE_COMPLETED, timeSupervision=0, error1=SA_AIS_OK) at saClm_wrap.c:2901 2901 printf("root cse entity in c-clbk %s",rootCauseEntity->value); (gdb) p (*notificationBuff)->notification[0] $50 = {clusterNode = {nodeId = 132111, nodeAddress = {family = SA_CLM_AF_INET, length = 0, value = '\000' }, nodeName = {length = 36, value = "safNode=PL-4,safCluster=myClmCluster", '\000' }, executionEnvironment = {length = 0, value = '\000' }, member = SA_TRUE, bootTimestamp = 14300225290, initialViewNumber = 65}, clusterChange = SA_CLM_NODE_JOINED} In this case, notification is sent about the node joining the cluster, which is improper. The node never left the cluster and there is no notification for that, which is fine. === Apr 26 12:18:46 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safNode=PL-4,safCluster=myClmCluster" notifyingObject = "safApp=safClmService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_CLM.101 (0x65) additionalText = "CLM node safNode=PL-4,safCluster=myClmCluster Joined" sourceIndicator = SA_NTF_OBJECT_OPERATION State ID = SA_CLM_CLUSTER_CHANGE_STATUS New State: SA_CLM_NODE_JOINED 4) When the lock operation is in progress, hold the response in the start step callback and stop the opensaf / reboot the node ( on which operation is in progress). In this case notificationBuff is filled up with number of items set to zero. Breakpoint 1, pycbk_SaClmClusterTrackCallbackT_4 (notificationBuff=0x85e3e8, numberOfMembers=4, invocation=0, rootCauseEntity=0x85e8f0, correlationIds=0x7f5df0, step=SA_CLM_CHANGE_COMPLETED, timeSupervision=1, error1=SA_AIS_OK) at saClm_wrap.c:2901 (gdb) p (*notificationBuff) $64 = {viewNumber = 74, numberOfItems = 1, notification = 0x85e670} (gdb) p (*notificationBuff)->notification[0] $65 = {clusterNode = {nodeId = 132111, nodeAddress = {family = SA_CLM_AF_INET, length = 0, value = '\000' }, nodeName = {length = 36, value = "safNode=PL-4,safCluster=myClmCluster", '\000' }, executionEnvironment = {length = 0, value = '\000' }, member = SA_FALSE, bootTimestamp = 14300315740, initialViewNumber = 73}, clusterChange = SA_CLM_NODE_LEFT} Callback when node is rebooted in the
[tickets] [opensaf:tickets] #1345 CLM: Tracking for changes should not be started incase of improper track flags
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1345] CLM: Tracking for changes should not be started incase of improper track flags** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Sun Apr 26, 2015 06:06 PM UTC by Srikanth R **Last Updated:** Sun Apr 26, 2015 06:09 PM UTC **Owner:** nobody Changeset : 6377 Tracking should not be started, if track flags are not set with either of flag SA_TRACK_CHANGES or SA_TRACK_CHANGES_ONLY. As of now, callbacks are not sent to the agent, but clm service waits for the response from the agent in the callback if admin operation is issued. For the track flags combination :TRACK_CURRENT | TRACK_START | TRACK_VALIDATE , agent does not get callback, but clm waits for the response for an admin operation --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1323 Java : API does not return values as expected when version parameter is passed incorrectly
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1323] Java : API does not return values as expected when version parameter is passed incorrectly** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 21, 2015 11:11 AM UTC by Sirisha Alla **Last Updated:** Tue Apr 21, 2015 11:11 AM UTC **Owner:** nobody This is a clone of devel ticket 2272 When wrong version is input to the initializeHandle() API, major version is not being returned as per the expected supported version. Example: When C.1.1 is passed as input to the version, the version returned is B.1.1 where the expectation is B.4.1 When minorVersion is specified with version less than supported minor version, ERR_VERSION is being returned. Specification says that the minor version needs to be ignored and SA_AIS_OK needs to be returned. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1278 IMM: admin owner clear/release on an object is allowed when admin operation is in progress for the object
- **Milestone**: 4.4.2 --> never --- ** [tickets:#1278] IMM: admin owner clear/release on an object is allowed when admin operation is in progress for the object** **Status:** wontfix **Milestone:** never **Created:** Tue Mar 24, 2015 05:44 AM UTC by Sirisha Alla **Last Updated:** Fri Mar 27, 2015 11:17 AM UTC **Owner:** Anders Bjornerstedt This issue is seen on 46FC Tag changeset, this may also be relevant to all the older versions of OpenSAF(not verified) Spec says on Page 67: The operation fails if an administrative operation is currently in progress on one of the targeted objects. An administrative operation is considered to be in progress on an object if the SaImmOiAdminOperationCallbackT_2 Object Implementer's callback has been invoked for that operation and the Object Implementer is still registered but has not yet called saImmOiAdminOperationResult() to provide the operation results. To simulate the above case, invoked AdminOperationAsync on an object in the test application. After AdminOperationCallback is invoked, without responding with AdminOperationResult from the object OI, invoked adminOwnerRelease from OM and the API succeeded. According to the spec ERR_BUSY needs to be given as response to AdminOwnerRelease operation. The same is applicable for AdminOwnerClear() API. IMMND trace on that node: Mar 24 11:02:13.611054 osafimmnd [4131:ImmModel.cc:10998] >> adminOperationInvoke Mar 24 11:02:13.611072 osafimmnd [4131:ImmModel.cc:11005] T5 Admin op on objectName:xattrName_testAdminOwnerRelease_Failures_1012 Mar 24 11:02:13.61 osafimmnd [4131:ImmModel.cc:4] T5 IMPLEMENTER FOR ADMIN OPERATION INVOKE 19 conn:55 node:2030f name:implementer_testAdminOwnerRelease_Failures_101 Mar 24 11:02:13.611139 osafimmnd [4131:ImmModel.cc:11122] T5 Updating req invocation inv:34359738367 conn:54 timeout:0 Mar 24 11:02:13.611163 osafimmnd [4131:ImmModel.cc:11129] TR Located pre request continuation 34359738367 adjusting timeout to 0 Mar 24 11:02:13.611182 osafimmnd [4131:ImmModel.cc:11157] T5 Storing impl invocation 55 for inv: 34359738367 Mar 24 11:02:13.611215 osafimmnd [4131:ImmModel.cc:11226] << adminOperationInvoke Mar 24 11:02:13.611252 osafimmnd [4131:immnd_evt.c:4984] T2 IMMND sending Agent upcall Mar 24 11:02:13.613901 osafimmnd [4131:immnd_evt.c:4990] T2 IMMND UPCALL TO AGENT SEND SUCCEEDED Mar 24 11:02:13.614270 osafimmnd [4131:immnd_evt.c:5128] T2 Delayed reply, wait for reply from implementer Mar 24 11:02:13.614547 osafimmnd [4131:immnd_evt.c:5132] << immnd_evt_proc_admop Mar 24 11:02:13.614873 osafimmnd [4131:immnd_evt.c:8658] >> dequeue_outgoing Mar 24 11:02:13.615112 osafimmnd [4131:immnd_evt.c:8664] TR Pending replies:0 space:16 out list?:(nil) Mar 24 11:02:13.615396 osafimmnd [4131:immnd_evt.c:8693] << dequeue_outgoing Mar 24 11:02:13.615829 osafimmnd [4131:immnd_evt.c:8777] << immnd_evt_proc_fevs_rcv Mar 24 11:02:14.496009 osafimmnd [4131:ImmModel.cc:12450] T5 Did not timeout now - start < 0(1) Mar 24 11:02:14.609660 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f Mar 24 11:02:14.609724 osafimmnd [4131:immnd_evt.c:2837] T2 sender_count: 1 size: 268 Mar 24 11:02:14.609761 osafimmnd [4131:immnd_evt.c:3118] >> immnd_fevs_local_checks Mar 24 11:02:14.609808 osafimmnd [4131:immnd_evt.c:3575] << immnd_fevs_local_checks Mar 24 11:02:14.609838 osafimmnd [4131:immnd_evt.c:3036] T2 SENDING FEVS TO IMMD Mar 24 11:02:14.609863 osafimmnd [4131:immsv_evt.c:5481] T8 Sending: IMMD_EVT_ND2D_FEVS_REQ to 0 Mar 24 11:02:14.616600 osafimmnd [4131:immnd_evt.c:8716] >> immnd_evt_proc_fevs_rcv Mar 24 11:02:14.616745 osafimmnd [4131:immnd_evt.c:8732] T2 FEVS from myself, still pending:0 Mar 24 11:02:14.616815 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_ADMO_RELEASE (10) from 0 Mar 24 11:02:14.616860 osafimmnd [4131:ImmModel.cc:4549] >> adminOwnerChange Mar 24 11:02:14.616893 osafimmnd [4131:ImmModel.cc:4576] T5 Release admin owner 'exowner' Mar 24 11:02:14.634875 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount Mar 24 11:02:14.635431 osafimmnd [4131:ImmModel.cc:4589] T5 Release Admin Owner for object xattrName_testAdminOwnerRelease_Failures_1012 Mar 24 11:02:14.641743 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount Mar 24 11:02:14.642150 osafimmnd [4131:ImmModel.cc:4694] << adminOwnerChange --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To
[tickets] [opensaf:tickets] #1184 daemonize does not support changing primary group
- **Milestone**: 4.4.2 --> never --- ** [tickets:#1184] daemonize does not support changing primary group** **Status:** invalid **Milestone:** never **Created:** Tue Oct 21, 2014 02:29 PM UTC by Hans Feldt **Last Updated:** Tue Oct 21, 2014 04:02 PM UTC **Owner:** nobody The environment variable OPENSAF_GROUP exported in nid.conf is not respected in daemon.c For consistency with specifying the user name, specifying primary group name should also be provided on the command line --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1076 2PBE: pbed aborts at pbeClosePrepareTrans
- **Milestone**: 4.4.2 --> never --- ** [tickets:#1076] 2PBE: pbed aborts at pbeClosePrepareTrans** **Status:** duplicate **Milestone:** never **Created:** Mon Sep 15, 2014 06:52 AM UTC by Sirisha Alla **Last Updated:** Thu Feb 19, 2015 11:42 AM UTC **Owner:** Anders Bjornerstedt **Attachments:** - [SLOT2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1076/attachment/SLOT2.tar.bz2) (11.8 MB; application/x-bzip) The issue is seen on SLES X86 with 2PBE and 50k objects. Opensaf is running on changeset 5697 + #946 patches Syslog on SC-2: Sep 12 19:15:00 SLES-64BIT-SLOT2 osafamfnd[2409]: NO Assigned 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Sep 12 19:15:00 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer disconnected 618 <0, 2010f> (@OpenSafImmReplicatorA) Sep 12 19:15:00 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:00 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer (applier) connected: 640 (@OpenSafImmReplicatorA) <0, 2010f> Sep 12 19:15:01 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:01 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:02 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:02 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:03 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:03 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:04 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:04 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:05 SLES-64BIT-SLOT2 osafimmpbed: IN PBE slave waiting for prepare from primary on PRTA update ccb:100f0 Sep 12 19:15:05 SLES-64BIT-SLOT2 osafimmpbed: NO Slave PBE time-out in waiting on porepare for PRTA update ccb:100f0 dn:safNode=PL-3,safCluster=myClmCluster Sep 12 19:15:06 SLES-64BIT-SLOT2 osafimmnd[2332]: WA Timeout on Persistent runtime Object Mutation, waiting on PBE Sep 12 19:15:06 SLES-64BIT-SLOT2 osafimmnd[2332]: WA Got error on non local rt object update err: 6 Sep 12 19:15:07 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer disconnected 610 <0, 2010f> (safAmfService) Sep 12 19:15:07 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer (applier) connected: 641 (@safAmfService2010f) <0, 2010f> Sep 12 19:15:07 SLES-64BIT-SLOT2 osafamfd[2396]: NO Switching StandBy --> Active State Sep 12 19:15:07 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer disconnected 623 <14, 2020f> (@safAmfService2020f) Sep 12 19:15:07 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer connected: 642 (safAmfService) <14, 2020f> Sep 12 19:15:07 SLES-64BIT-SLOT2 osafrded[2303]: NO RDE role set to ACTIVE Sep 12 19:15:07 SLES-64BIT-SLOT2 osafclmd[2377]: NO ACTIVE request Sep 12 19:15:07 SLES-64BIT-SLOT2 osafamfd[2396]: NO Controller switch over done Sep 12 19:15:12 SLES-64BIT-SLOT2 osafimmnd[2332]: WA Timeout on Persistent runtime Object Mutation, waiting on PBE Sep 12 19:15:12 SLES-64BIT-SLOT2 osafimmnd[2332]: WA >>s_info->to_svc == 0<< reply context destroyed before this reply could be made Sep 12 19:15:12 SLES-64BIT-SLOT2 osafimmnd[2332]: ER Failed to send response to agent/client over MDS rc:2 Sep 12 19:15:14 SLES-64BIT-SLOT2 osafimmpbed: NO 2PBE Error (21) in PRTA update (ccbId:100f0) Sep 12 19:15:14 SLES-64BIT-SLOT2 osafimmnd[2332]: WA update of PERSISTENT runtime attributes in object 'safNode=PL-3,safCluster=myClmCluster' REVERTED. PBE rc:21 Sep 12 19:15:15 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Create of class testMA_verifyPrimNoResponseDelCallback_101 is PERSISTENT. Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmpbed: IN Create of class testMA_verifyPrimNoResponseDelCallback_101 committing with ccbId:100ee Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmpbed: ER pbePrepareTrans was called when sqliteTransLock(0)!=1 Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer locally disconnected. Marking it as doomed 625 <315, 2020f> (@OpenSafImmPBE) Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer locally disconnected. Marking it as doomed 626 <316, 2020f> (OsafImmPbeRt_B) Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer disconnected 625 <315, 2020f> (@OpenSafImmPBE) Sep 12 19:15:16 SLES-64BIT-SLOT2 osafimmnd[2332]: NO Implementer disconnected 626 <316, 2020f> (OsafImmPbeRt_B) Sep 12 19:15:17 SLES-64BIT-SLOT2 osafimmnd[2332]: WA SLAVE PBE process has apparently died at non coord Program terminated with signal 6, Aborted. #0 0x7fd4af31fb55 in
[tickets] [opensaf:tickets] #999 NTF: memory leak due to missing removal of std:tr1:shared_ptr in last container
- **Milestone**: 4.5.0 --> never --- ** [tickets:#999] NTF: memory leak due to missing removal of std:tr1:shared_ptr in last container** **Status:** wontfix **Milestone:** never **Created:** Wed Aug 20, 2014 02:09 PM UTC by Minh Hon Chau **Last Updated:** Fri Aug 22, 2014 12:37 AM UTC **Owner:** Minh Hon Chau In the method NtfAdmin::deleteConfirmedNotification(...), the NtfNotification object should be destroyed after NtfAdmin::notificationMap erases the NtfSmartPtr. But the fact that there's another container (NtfLogger::coll_) still owning this shared_ptr, thus the destructor of NtfNotification will not be invoked. That causes memory leak because NtfLogger::coll_ has never removed its element --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1000 IMMND asserts at immsv_evt_enc_inline_text after SetErrorString operation
- **Milestone**: 4.3.3 --> never --- ** [tickets:#1000] IMMND asserts at immsv_evt_enc_inline_text after SetErrorString operation ** **Status:** duplicate **Milestone:** never **Created:** Thu Aug 21, 2014 05:22 AM UTC by Sirisha Alla **Last Updated:** Mon Aug 25, 2014 04:45 AM UTC **Owner:** nobody **Attachments:** - [logs.tar](https://sourceforge.net/p/opensaf/tickets/1000/attachment/logs.tar) (18.9 MB; application/x-tar) This issue is seen on SLES 64bit 4 node testbed running with 4.5 changeset 5608 plus patches for #938,#994 and #997 The test is to do OiCcbSetErrorString inside createCallback() Twice and check that the second invocation of SetErrorString() returns BAD_OPERATION. The CreateCallback() returned with BAD_OPERATION when IMMND crashed. syslog on the payload where the test is in progress: Aug 21 10:12:49 SLES-64BIT-SLOT4 osafimmnd[3019]: NO implementer for class 'testCcbExt_verifySetErrStrSingleStrPerCbk_133' is implementertestCcbExt_verifySetErrStrSingleStrPerCbk_133 => class extent is safe. Aug 21 10:12:49 SLES-64BIT-SLOT4 osafimmnd[3019]: NO ImmModel::ccbObjCreateContinuation: implementer returned error, Ccb aborted with error: 20 Aug 21 10:12:49 SLES-64BIT-SLOT4 osafimmnd[3019]: WA immsv_evt_enc_inline_text: Length missmatch from source line:1098 (1 342010752 '') Aug 21 10:12:49 SLES-64BIT-SLOT4 osafimmnd[3019]: immsv_evt.c:1098: immsv_evt_enc_attrName: Assertion 'immsv_evt_enc_inline_text(__LINE__, o_ub, os)' failed. Aug 21 10:12:49 SLES-64BIT-SLOT4 osafamfnd[3038]: NO 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Aug 21 10:12:49 SLES-64BIT-SLOT4 osafamfnd[3038]: NO Restarting a component of 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Aug 21 10:12:49 SLES-64BIT-SLOT4 osafamfnd[3038]: NO 'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Following is the back trace of the core: Program terminated with signal 6, Aborted. #0 0x7f5decac5b55 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f5decac5b55 in raise () from /lib64/libc.so.6 #1 0x7f5decac7131 in abort () from /lib64/libc.so.6 #2 0x7f5deddffc0e in __osafassert_fail () from /usr/lib64/libopensaf_core.so.0 #3 0x0047ddef in immsv_evt_enc_attrName.part.5 () at immsv_evt.c:1098 #4 0x0047e49a in immsv_evt_enc_sublevels () at immsv_evt.c:1506 #5 0x00418ccf in immnd_mds_callback () #6 0x7f5dede25f9f in mds_mcm_send_msg_enc () from /usr/lib64/libopensaf_core.so.0 #7 0x7f5dede2670d in mcm_pvt_red_snd_process_common () from /usr/lib64/libopensaf_core.so.0 #8 0x7f5dede2b299 in mds_send () from /usr/lib64/libopensaf_core.so.0 #9 0x7f5dede23d78 in ncsmds_api () from /usr/lib64/libopensaf_core.so.0 #10 0x00419727 in immnd_mds_send_rsp () #11 0x0040a329 in immnd_evt_proc_ccb_obj_create_rsp.isra.43 () at immnd_evt.c:3538 #12 0x00415f00 in immnd_evt_proc_fevs_dispatch () at immnd_evt.c:7782 #13 0x0041820d in immnd_process_evt () at immnd_evt.c:8506 #14 0x0040b6ab in main () at immnd_main.c:336 (gdb) thread apply all bt Thread 4 (Thread 0x7f5dec25a700 (LWP 3023)): #0 0x7f5decb614f6 in poll () from /lib64/libc.so.6 #1 0x7f5deddfc5f0 in osaf_poll_no_timeout () from /usr/lib64/libopensaf_core.so.0 #2 0x7f5deddfc875 in osaf_poll () from /usr/lib64/libopensaf_core.so.0 #3 0x7f5deddfea02 in auth_server_main () from /usr/lib64/libopensaf_core.so.0 #4 0x7f5ded5a77b6 in start_thread () from /lib64/libpthread.so.0 #5 0x7f5decb6a9cd in clone () from /lib64/libc.so.6 #6 0x in ?? () Thread 3 (Thread 0x7f5dee244b00 (LWP 3022)): #0 0x7f5decb614f6 in poll () from /lib64/libc.so.6 #1 0x7f5dede34b35 in mdtm_process_recv_events () from /usr/lib64/libopensaf_core.so.0 #2 0x7f5ded5a77b6 in start_thread () from /lib64/libpthread.so.0 #3 0x7f5decb6a9cd in clone () from /lib64/libc.so.6 #4 0x in ?? () Thread 2 (Thread 0x7f5dee275b00 (LWP 3021)): #0 0x7f5decb614f6 in poll () from /lib64/libc.so.6 #1 0x7f5deddfc5f0 in osaf_poll_no_timeout () from /usr/lib64/libopensaf_core.so.0 #2 0x7f5deddfc7f5 in osaf_ppoll () from /usr/lib64/libopensaf_core.so.0 #3 0x7f5dede033df in ncs_tmr_wait () from /usr/lib64/libopensaf_core.so.0 #4 0x7f5ded5a77b6 in start_thread () from /lib64/libpthread.so.0 #5 0x7f5decb6a9cd in clone () from /lib64/libc.so.6 #6 0x in ?? () Thread 1 (Thread 0x7f5dee247720 (LWP 3019)): #0 0x7f5decac5b55 in raise () from /lib64/libc.so.6 #1 0x7f5decac7131 in abort () from /lib64/libc.so.6 #2 0x7f5deddffc0e in __osafassert_fail () from /usr/lib64/libopensaf_core.so.0 #3 0x0047ddef in immsv_evt_enc_attrName.part.5 () at immsv_evt.c:1098 #4 0x0047e49a in immsv_evt_enc_sublevels ()
[tickets] [opensaf:tickets] #980 IMM: immcfg coredump when creating long dn object
- **Milestone**: 4.5.0 --> never --- ** [tickets:#980] IMM: immcfg coredump when creating long dn object** **Status:** invalid **Milestone:** never **Created:** Fri Aug 08, 2014 07:38 PM UTC by Minh Hon Chau **Last Updated:** Mon Sep 08, 2014 10:51 AM UTC **Owner:** Zoran Milinkovic Coredump on immcfg by following test: root@uvb:~/ grep EXTENDED /etc/opensaf/immnd.conf export SA_ENABLE_EXTENDED_NAMES=1 root@uvb:~/ immcfg -m -a longDnsAllowed=1 opensafImm=opensafImm,safApp=safImmService root@uvb:~/ immcfg -f longdn_class.xml root@uvb:~/ immcfg -c OsafNtfCmTestCFG stringRdnCfg=abcd root@uvb:~/ immcfg -a testNameCfg=123 stringRdnCfg=abcd root@uvb:~/ immlist stringRdnCfg=abcd Name Type Value(s) testNameCfgSA_NAME_T123 (3) stringRdnCfg SA_STRING_T stringRdnCfg=abcd SaImmAttrImplementerName SA_STRING_T SaImmAttrClassName SA_STRING_T OsafNtfCmTestCFG SaImmAttrAdminOwnerNameSA_STRING_T root@uvb:~/ immcfg -a testNameCfg= stringRdnCfg=abcd Aborted (core dumped) --- The longdn_class.xml as below: " SA_CONFIG stringRdnCfg SA_STRING_T SA_CONFIG SA_INITIALIZED SA_NOTIFY testNameCfg SA_NAME_T SA_CONFIG SA_MULTI_VALUE SA_NOTIFY SA_WRITABLE " the backtrace as below: Core was generated by `immcfg -a testNameCfg=1'. Program terminated with signal SIGABRT, Aborted. \#0 0x415d5f79 in raise () from /lib64/libc.so.6 (gdb) bt \#0 0x415d5f79 in raise () from /lib64/libc.so.6 \#1 0x415d9388 in abort () from /lib64/libc.so.6 \#2 0x409c5cbe in __osafassert_fail (__file=__file@entry=0x409f9cd1 "osaf_extended_name.c", __line=__line@entry=130, __func=__func@entry=0x409f9d60 <__FUNCTION__.3257> "osaf_extended_name_length", __assertion=__assertion@entry=0x409f9d10 "osaf_extended_names_enabled && length >= SA_MAX_UNEXTENDED_NAME_LENGTH") at sysf_def.c:281 \#3 0x409c3936 in osaf_extended_name_length (name=name@entry=0x62dc60) at osaf_extended_name.c:129 \#4 0x40c322fd in imma_copyAttrValue (p=0x62d870, attrValueType=SA_IMM_ATTR_SANAMET, attrValue=0x62dc60) at imma_init.c:421 \#5 0x40c24e77 in saImmOmCcbObjectModify_2 (ccbHandle=ccbHandle@entry=1406852104797087000, objectName=objectName@entry=0x61b240, attrMods=attrMods@entry=0x62dc40) at imma_om_api.c:2349 \#6 0x00412ddc in immutil_saImmOmCcbObjectModify_2 (immCcbHandle=1406852104797087000, objectName=0x61b240, attrMods=attrMods@entry=0x62dc40) at immutil.c:1540 \#7 0x0040d2ee in object_modify (objectNames=objectNames@entry=0x61b220, optargs=optargs@entry=0x61b010, optargs_len=optargs_len@entry=1) at imm_cfg.c:589 \#8 0x0040e894 in imm_operation (argc=4, argv=) at imm_cfg.c:1439 \#9 0x415c0ec5 in __libc_start_main () from /lib64/libc.so.6 \#10 0x0040378e in _start () (gdb) f 2 \#2 0x409c5cbe in __osafassert_fail (__file=__file@entry=0x409f9cd1 "osaf_extended_name.c", __line=__line@entry=130, __func=__func@entry=0x409f9d60 <__FUNCTION__.3257> "osaf_extended_name_length", __assertion=__assertion@entry=0x409f9d10 "osaf_extended_names_enabled && length >= SA_MAX_UNEXTENDED_NAME_LENGTH") at sysf_def.c:281 281 abort(); (gdb) p osaf_extended_names_enabled $1 = false (gdb) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #972 OI rejection through callback results in immnd crash
- **Milestone**: 4.5.FC --> never --- ** [tickets:#972] OI rejection through callback results in immnd crash** **Status:** duplicate **Milestone:** never **Created:** Wed Jul 30, 2014 11:49 AM UTC by surender khetavath **Last Updated:** Thu Aug 07, 2014 07:51 AM UTC **Owner:** nobody **Attachments:** - [sc1_logs.tgz](https://sourceforge.net/p/opensaf/tickets/972/attachment/sc1_logs.tgz) (461.2 kB; application/x-compressed-tar) gcc 4.9 setup : 1 controller changeset : 5491 and patch from #643 Running any ccb operation and allowing the OI to reject this operation by replying with ERR_BAD_OP inside callback, results in immnd crash (gdb) bt #0 0x000a0042db53 in ?? () #1 0x000c in ?? () #2 0x006d19c0 in ?? () #3 0x7fff030f89a0 in ?? () #4 0x0041817c in immnd_evt_proc_ccb_finalize () Backtrace stopped: frame did not save the PC (gdb) bt full #0 0x000a0042db53 in ?? () No symbol table info available. #1 0x000c in ?? () No symbol table info available. #2 0x006d19c0 in ?? () No symbol table info available. #3 0x7fff030f89a0 in ?? () No symbol table info available. #4 0x0041817c in immnd_evt_proc_ccb_finalize () No symbol table info available. Backtrace stopped: frame did not save the PC (gdb) logs attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1576 AMF : SU struck in terminating ( health check timeout - proxy proxied )
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1576] AMF : SU struck in terminating ( health check timeout - proxy proxied )** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Oct 29, 2015 05:49 AM UTC by Srikanth R **Last Updated:** Thu Oct 29, 2015 05:49 AM UTC **Owner:** nobody **Attachments:** - [1570.tgz](https://sourceforge.net/p/opensaf/tickets/1576/attachment/1570.tgz) (1.5 MB; application/x-compressed-tar) Changeset : 6901 Application : SU1 mapped to SC-2 & SU2 mapped to SC-1. Each SU consists of 3 Pre instantiable components ( one of the component is LOCAL & PROXIED and the other two components are SA_AWARE ) Steps : * Brought up two controllers in the cluster. * Performed unlock-in operation on SU1. * Health check is started by both SA-AWARE components. * One of the SA-AWARE components faulted in health check and as part of repair, SU is struck in terminating state. Oct 29 10:30:35 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State INSTANTIATING => INSTANTIATED Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO saAmfSUFailover is true for 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO SU failover probation timer started (timeout: 12000 ns) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Performing failover of 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' (SU failover count: 1) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safComp=2nAdminRepair,safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' recovery action escalated from 'noRecommendation' to 'suFailover' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safComp=2nAdminRepair,safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' faulted due to 'healthCheckcallbackTimeout' : Recovery is 'suFailover' Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Terminating components of 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair'(abruptly & unordered) Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State INSTANTIATED => TERMINATING Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State TERMINATING => TERMINATING Oct 29 10:31:21 SYSTEST-CNTLR-2 osafamfnd[3617]: NO 'safSu=2nAdminRepair_SU_1,safSg=2nAdminRepair_SG,safApp=2nAdminRepair' Presence State TERMINATING => TERMINATING * Amfd crashes during opensafd stop on the SC-2, Oct 29 11:27:46 SYSTEST-CNTLR-2 opensafd: Stopping OpenSAF Services Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Shutdown initiated Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfnd[3617]: NO Terminating all AMF components ... Oct 29 11:27:46 SYSTEST-CNTLR-2 osafamfd[3607]: NO Re-initializing with IMM ... Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfd[3607]: exiting for shutdown Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfnd[3617]: ER AMF director unexpectedly crashed Oct 29 11:28:46 SYSTEST-CNTLR-2 osafamfnd[3617]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1542 AMF : Quiesced callbacks should be generated, during recovery (su failover flag disabled)
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1542] AMF : Quiesced callbacks should be generated, during recovery (su failover flag disabled)** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Oct 13, 2015 10:33 AM UTC by Srikanth R **Last Updated:** Tue Oct 20, 2015 11:17 AM UTC **Owner:** nobody Changeset : 6901 Application : 2n 4 SIs configured with SI1 as sponsor for SI2,SI3,SI4 Component recovery policy - 3 sufailoverflag -0 Steps : * Initially all the SIs are in unassigned state. SU1 is hosted active and SU2 is hosted standby * Performed lock on SI4. * Later performed unlock on SI4, for which component in SU1 rejected the active callback. * As part of recovery, all the assignments to SU1 should be removed and active assignments to be given to standby su .i.e SU2. * In the current implementation, quiesced callbacks are not generated during removal of assignments. * According to the spec page NO ;195, If the service unit is configured to fail over as a single entity (saAmfSUFailover set to SA_TRUE), all other components of the service unit are abruptly termi- nated, and all service instances assigned to that service unit are failed over; oth- erwise, only the erroneous component is abruptly terminated, and all component service instances that were assigned to it are failed over. Other components are not terminated, but all service instances that contained one of the failed over component service instances have their remaining component service instances switched over * Below is the syslog on the node where SU1 is hosted. Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCED to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_TwoN' QUIESCED to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:24 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO SU failover probation timer started (timeout: 12000 ns) Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Performing failover of 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' (SU failover count: 1) Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safComp=COMP2,safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' faulted due to 'csiSetcallbackFailed' : Recovery is 'componentFailover' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State INSTANTIATED => TERMINATING Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_TwoN' ACTIVE to 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'all (4) SIs' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI1,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI3,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removing 'safSi=TestApp_SI4,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI1,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI2,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI3,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'safSi=TestApp_SI4,safApp=TestApp_TwoN' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO Removed 'all SIs' from 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 13 03:15:27 SYSTEST-PLD-1 osafamfnd[2725]: NO 'safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State TERMINATING => INSTANTIATED --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to
[tickets] [opensaf:tickets] #1548 logd on standby crashed, for nonexistent logsv data group
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1548] logd on standby crashed, for nonexistent logsv data group** **Status:** assigned **Milestone:** 4.6.2 **Created:** Thu Oct 15, 2015 12:29 PM UTC by Srikanth R **Last Updated:** Thu Oct 15, 2015 02:53 PM UTC **Owner:** Mathi Naickan Changeset : 6901 Steps : Logsv crashes on standby controller, if the group does not exits on standby controller. For the following command, logd crashed on the standby controller with the syslog. immcfg -a logDataGroupname=testGroup logConfig=1,safApp=safLogService Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: ER osaf_user_is_member_of_group: group 'testGroup' does not exist Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: WA lgs_cfg_verify_log_data_groupname: osaf_user_is_member_of_group() Fail Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: WA lgs_cfg_update: Verify fail for lgs configuration Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: ER ckpt_proc_lgs_cfg_v5 lgs_cfg_update Fail Oct 15 17:09:49 CONTROLLER-1 osaflogd[2227]: lgs_mbcsv_v5.c:127: ckpt_proc_lgs_cfg_v5: Assertion '0' failed. Oct 15 17:09:49 CONTROLLER-1 osafamfnd[2281]: NO 'safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Logd also crashes, if the user is not part of the newly created group on the standby. Logsv should reject the ccb operation, if the standby is not properly updated --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1541 AMF : Both the 2N SUs are assigned Standby SI Assignment run time objects.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1541] AMF : Both the 2N SUs are assigned Standby SI Assignment run time objects.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Oct 13, 2015 07:17 AM UTC by Srikanth R **Last Updated:** Mon Oct 19, 2015 11:49 AM UTC **Owner:** nobody **Attachments:** - [1541.sh](https://sourceforge.net/p/opensaf/tickets/1541/attachment/1541.sh) (16.7 kB; application/x-shellscript) Changeset : 6901 Configuration : 2N 2 SUs and 4 SIs with out si-si deps. Component recovery = 3 ( suFailoverflag disabled ) Steps : * Initially all the SIs are in assigned state. * Performed shutdown operation on the SU hosting active assignment * In the quiescing callback, ensure that component do not respond. * As part of recovery, the other SU got active callbacks. But the SI assignment objects for active are not created. Only the standby SI assignments are present in IMM. * After unlocking the locked SU, both the SUs are showing standby assignment from the siass runtime objects. safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU1\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI1,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI2,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI3,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) safSISU=safSu=TestApp_SU2\,safSg=TestApp_SG1\,safApp=TestApp_TwoN,safSi=TestApp_SI4,safApp=TestApp_TwoN saAmfSISUHAState=STANDBY(2) Below is the error logged in active controller syslog. Oct 13 00:19:51 CONTROLLER-2 osafamfd[11712]: EM sg_2n_fsm.cc:2359: safSu=TestApp_SU1,safSg=TestApp_SG1,safApp=TestApp_TwoN (55) Configuration to create the application is attached. Same issue is observed for the similar scenario during Node shutdown operation --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1485 amf:Nway, Unstable SG during SI lock when standby faulted with comp failover recovery.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1485] amf:Nway, Unstable SG during SI lock when standby faulted with comp failover recovery.** **Status:** unassigned **Milestone:** 4.6.2 **Labels:** NWAY COMP_FAILOVER **Created:** Wed Sep 16, 2015 09:24 AM UTC by Praveen **Last Updated:** Wed Sep 16, 2015 09:24 AM UTC **Owner:** nobody **Attachments:** - [AppConfig-N-Way.xml](https://sourceforge.net/p/opensaf/tickets/1485/attachment/AppConfig-N-Way.xml) (16.1 kB; text/xml) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/1485/attachment/osafamfd) (280.5 kB; application/octet-stream) Attached is the configuration and AMF traces to reproduce the problem. steps to reproduce: 1)Lock the SI which has assignment on all the SUs. 2)When active component is processing quiesced callback, kill the standby comp for this SI on other SU with component failover recovery. 3)AMF will revert back SI to unlocked state. 4)SG becomes unstable. 5)For the faulted SU, removal of assignments is not performed and it stuck in Terminating state. Assignments before si lock: safSISU=safSu=SU3\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU3\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) After SI lock and fault assignment status and su state: safSISU=safSu=SU3\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU3\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU2\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=STANDBY(2) safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1 saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU1\,safSg=AmfDemo\,safApp=AmfDemo1,safSi=AmfDemo1,safApp=AmfDemo1 saAmfSISUHAState=QUIESCED(3) safSu=SU3,safSg=AmfDemo,safApp=AmfDemo1 saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=TERMINATING(4) saAmfSUReadinessState=OUT-OF-SERVICE(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1488 LCK: if master GLND reboots deadlock can occur for currently held locks
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1488] LCK: if master GLND reboots deadlock can occur for currently held locks** **Status:** review **Milestone:** 4.6.2 **Created:** Wed Sep 16, 2015 02:54 PM UTC by Alex Jones **Last Updated:** Thu Sep 17, 2015 04:41 PM UTC **Owner:** Alex Jones If the master GLND is rebooted while an exclusive lock (or locks) is held, when the new master is elected and the other GLNDs send over the current lock information held by them to the new master, they do not send all information needed by the new master to lock/unlock currently held locks. When this happens the lock(s) can never be unlocked or granted. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1317 ckpt : stale replicas observed in a 70 node cluster
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1317] ckpt : stale replicas observed in a 70 node cluster** **Status:** assigned **Milestone:** 4.6.2 **Created:** Wed Apr 15, 2015 10:16 AM UTC by Sirisha Alla **Last Updated:** Tue Aug 11, 2015 06:12 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/1317/attachment/logs.tar.bz2) (6.5 MB; application/x-bzip) This issue is observed on cs6377 (46FC Tag). The cluster is 0f 70 nodes and 2 checkpoint applications run on each node. The application running on the active controller creates the checkpoint, while the applications running on other nodes open the same checkpoint and use them. After sections are created, written and read from all the applications finalizes the handles used. The retention duration of the checkpoint is specified to a minimal value of 1000 nanoseconds. /dev/shm on the active controller after the applications exited. SLES-64BIT-SLOT1:~ # date;ls -lrt /dev/shm/ Wed Apr 15 14:25:09 IST 2015 total 1772 -rw-r--r-- 1 opensaf opensaf 1076040 Apr 15 13:38 opensaf_NCS_MQND_QUEUE_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 328000 Apr 15 13:38 opensaf_NCS_GLND_RES_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 16 Apr 15 13:38 opensaf_NCS_GLND_LCK_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 88000 Apr 15 13:38 opensaf_NCS_GLND_EVT_CKPT_INFO -rw-r--r-- 1 opensaf opensaf 704008 Apr 15 13:38 opensaf_CPND_CHECKPOINT_INFO_131343 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:55 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_4 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:56 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_9 -rw-r--r-- 1 opensaf opensaf 79848 Apr 15 13:57 opensaf_safCkpt=active_replica_ckpt_name_1_sysgrou_131343_16 SLES-64BIT-SLOT1:~ # date;immfind|grep -i ckpt Wed Apr 15 14:25:11 IST 2015 safApp=safCkptService SLES-64BIT-SLOT1:~ # When the same checkpoint name is being tried created, checkpoint service is not creating a new replica in the shared memory. cpd,cpnd traces are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1306 AMF: notifications during various admin operations
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1306] AMF: notifications during various admin operations** **Status:** assigned **Milestone:** 4.6.2 **Created:** Tue Apr 07, 2015 06:32 PM UTC by Srikanth R **Last Updated:** Tue May 26, 2015 05:57 AM UTC **Owner:** Praveen Changeset : 6377 ( 4.6 FC) Application / Model : Observed in 2n and NoRed models Below are the different issues observed in the notifications generated by AMF during admin operation 1)If the nodegroup / node is hosting the entire application and if lock operation is issued on the node group / node , alarm and notification order are in improper order Initially two state change notifications about the SI moving to partially assigned state and quiesced state should be generated and later alarm should be generated about SI being unassigned. But the notification and alarm are in improper order : === Apr 3 22:28:12 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSi=TWONSI1,safApp=TWONAPP" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.111 (0x6f) additionalText = "The Assignment state of SI safSi=TWONSI1,safApp=TWONAPP changed" sourceIndicator = SA_NTF_OBJECT_OPERATION State ID = SA_AMF_ASSIGNMENT_STATE Old State: SA_AMF_ASSIGNMENT_FULLY_ASSIGNED New State: SA_AMF_ASSIGNMENT_PARTIALLY_ASSIGNED === Apr 3 22:28:12 - Alarm === eventType = SA_NTF_ALARM_PROCESSING notificationObject = "safSi=TWONSI1,safApp=TWONAPP" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.5 (0x5) additionalText = "SI designated by safSi=TWONSI1,safApp=TWONAPP has no current active assignments to any SU" probableCause = SA_NTF_SOFTWARE_ERROR perceivedSeverity = SA_NTF_SEVERITY_MAJOR === Apr 3 22:28:12 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSu=SU1,safSg=SGONE,safApp=TWONAPP" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.110 (0x6e) additionalText = "The HA state of SI safSi=TWONSI1,safApp=TWONAPP assigned to SU safSu=SU1,safSg=SGONE,safApp=TWONAPP changed" - additionalInfo: 0 - infoId = 2 infoType = 10 infoValue = "safSi=TWONSI1,safApp=TWONAPP" sourceIndicator = SA_NTF_OBJECT_OPERATION State ID = SA_AMF_HA_STATE Old State: New State: SA_AMF_HA_QUIESCED Incase of SI lock operation, initially two state change notifications and later alarm are generated in proper way. 2) For the lock and shutdown operations, old state is not filled up when state change notification is issued for HA state change. Old state ( Active) is not filled up for shutdown operation === Apr 7 15:21:03 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSu=Srikanth_nored_3,safSg=SG_Srikanth_nored,safApp=nored" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.110 (0x6e) additionalText = "The HA state of SI safSi=Srikanth_nored_3,safApp=nored assigned to SU safSu=Srikanth_nored_3,safSg=SG_Srikanth_nored,safApp=nored changed" - additionalInfo: 0 - infoId = 2 infoType = 10 infoValue = "safSi=Srikanth_nored_3,safApp=nored" sourceIndicator = SA_NTF_OBJECT_OPERATION State ID = SA_AMF_HA_STATE Old State: New State: SA_AMF_HA_QUIESCING Old state should be quiescing === Apr 7 15:21:03 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSu=Srikanth_nored_3,safSg=SG_Srikanth_nored,safApp=nored" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.110 (0x6e) additionalText = "The HA state of SI safSi=Srikanth_nored_3,safApp=nored assigned to SU safSu=Srikanth_nored_3,safSg=SG_Srikanth_nored,safApp=nored changed" - additionalInfo: 0 - infoId = 2 infoType = 10 infoValue = "safSi=Srikanth_nored_3,safApp=nored" sourceIndicator = SA_NTF_OBJECT_OPERATION State ID = SA_AMF_HA_STATE Old State: New State: SA_AMF_HA_QUIESCED 3) An invalid extra notification is generated, when SG / SI is locked with no-redundancy SU in assigned,in-service and enabled state. === Apr 7 14:53:46 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSg=SG_Srikanth_nored,safApp=nored" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.103 (0x67) additionalText = "Admin state of safSg=SG_Srikanth_nored,safApp=nored changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID = SA_AMF_ADMIN_STATE Old State: SA_AMF_ADMIN_LOCKED New State: SA_AMF_ADMIN_LOCKED === Apr 7 15:11:58 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "safSi=Srikanth_nored_3,safApp=nored" notifyingObject = "safApp=safAmfService" notificationClassId = SA_NTF_VENDOR_ID_SAF.SA_SVC_AMF.104 (0x68) additionalText = "Admin state of safSi=Srikanth_nored_3,safApp=nored changed" sourceIndicator = SA_NTF_MANAGEMENT_OPERATION State ID =
[tickets] [opensaf:tickets] #887 saLckFinalize api returns ERR_LIBRARY after Glnd restart.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#887] saLckFinalize api returns ERR_LIBRARY after Glnd restart.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue May 06, 2014 10:39 AM UTC by Hrishikesh **Last Updated:** Mon Sep 28, 2015 05:45 AM UTC **Owner:** nobody **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/887/attachment/logs.tgz) (2.4 MB; application/x-compressed-tar) This ticket is replication of opensaf devel Ticket #3062 updated with latest logs. ChangeSet: 5142. Glsv API Calls return ERR_LIB after glsv node director restart SetUp:32bit Glsv App on 64bit SLES11 machine. Opensaf is running on 64bit machine and Glsv App with 32bit libraries. >From the logs its observed that GLND DECODE failed after ndrestart for a while. Lcknd trace: === May 6 14:31:17.435315 osaflcknd [14542:glnd_main.c:0039] TR GLSV:GLND:ON May 6 14:31:17.435324 osaflcknd [14542:glnd_api.c:0123] >> glnd_lib_req May 6 14:31:17.435330 osaflcknd [14542:glnd_api.c:0059] >> glnd_se_lib_create: pool id 0 May 6 14:31:17.435335 osaflcknd [14542:glnd_cb.c:0052] >> glnd_cb_create: pool_id 0 May 6 14:31:17.435351 osaflcknd [14542:glnd_mds.c:0117] >> glnd_mds_register May 6 14:31:17.435368 osaflcknd [14542:glnd_mds.c:0095] << glnd_mds_get_handle May 6 14:31:17.435721 osaflcknd [14542:glnd_mds.c:0228] >> glnd_mds_callback May 6 14:31:17.435746 osaflcknd [14542:glnd_mds.c:0435] >> glnd_mds_dec May 6 14:31:17.435780 osaflcknd [14542:glnd_mds.c:0271] << glnd_mds_callback May 6 14:31:17.435845 osaflcknd [14542:glnd_mds.c:0435] >> glnd_mds_dec May 6 14:31:17.435857 osaflcknd [14542:glnd_mds.c:1382] T2 GLND DEC FAILED May 6 14:31:17.435870 osaflcknd [14542:glnd_mds.c:0538] << glnd_mds_dec May 6 14:31:17.435879 osaflcknd [14542:glnd_mds.c:0267] T2 GLND mds callback process failed May 6 14:31:17.435890 osaflcknd [14542:glnd_mds.c:0271] << glnd_mds_callback May 6 14:31:17.435950 osaflcknd [14542:glnd_mds.c:0435] >> glnd_mds_dec May 6 14:31:17.435956 osaflcknd [14542:glnd_mds.c:1382] T2 GLND DEC FAILED May 6 14:31:17.436138 osaflcknd [14542:glnd_mds.c:0538] << glnd_mds_dec May 6 14:31:17.436143 osaflcknd [14542:glnd_mds.c:0267] T2 GLND mds callback process failed May 6 14:31:17.436151 osaflcknd [14542:glnd_mds.c:0271] << glnd_mds_callback May 6 14:31:17.436607 osaflcknd [14542:glnd_cb.c:0117] T1 GLND mds register success May 6 14:31:17.436612 osaflcknd [14542:glnd_amf.c:0152] >> glnd_amf_init May 6 14:31:17.436624 osaflcknd [14542:ava_api.c:0057] >> saAmfInitialize May 6 14:31:17.436635 osaflcknd [14542:ava_init.c:0311] >> ncs_ava_startup May 6 14:31:17.436645 osaflcknd [14542:ava_init.c:0078] >> ava_lib_req May 6 14:31:17.436651 osaflcknd [14542:ava_init.c:0123] >> ava_create May 6 14:31:17.436661 osaflcknd [14542:ava_init.c:0138] TR Component name = safComp=GLND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF May 6 14:31:17.436667 osaflcknd [14542:ava_init.c:0156] TR Created handle for the control block May 6 14:31:17.436674 osaflcknd [14542:ava_init.c:0160] TR Initialized the AVA control block lock May 6 14:31:17.436680 osaflcknd [14542:ava_init.c:0164] TR EDU Initialization success May 6 14:31:17.436742 osaflcknd [14542:ava_hdl.c:0060] >> ava_hdl_init May 6 14:31:17.436750 osaflcknd [14542:ava_hdl.c:0074] << ava_hdl_init May 6 14:31:17.436756 osaflcknd [14542:ava_init.c:0182] TR AVA Handles DB created successfully Lock app agent trace snippet: = May 6 15:36:04.981389 gla [17336:gla_mds.c:0742] << gla_mds_msg_sync_send May 6 15:36:04.981413 gla [17336:gla_api.c:0482] T2 GLA api lock finalize failed May 6 15:36:04.981430 gla [17336:gla_api.c:0491] << saLckFinalize: 'FAILURE' return value '2' May 6 15:36:04.982914 gla [17336:gla_api.c:0407] >> saLckFinalize: Called with Handle 637f80 === * During the execution of lock appl(involving all the api's) seg fault was observed and below is the snippet of it. Attachment has full backtrace for debugging. Core was generated by `/usr/lib64/opensaf/osaflcknd --tracemask=0x'. Program terminated with signal 11, Segmentation fault. #0 0x00405c05 in glnd_client_node_resource_add (client_info=0x0, res_info=0x639890) at glnd_client.c:227 227 glnd_client.c: No such file or directory. in glnd_client.c (gdb) bt #0 0x00405c05 in glnd_client_node_resource_add (client_info=0x0, res_info=0x639890) at glnd_client.c:227 #1 0x0040755b in glnd_process_gla_resource_open (glnd_cb=0x626f80, evt=0x63a380) at glnd_evt.c:634 #2 0x00406da3 in glnd_process_evt (cb=0x626f80, evt=0x63a380) at glnd_evt.c:358 #3 0x004034b7 in glnd_process_mbx (cb=0x626f80, mbx=0x626fb8) at glnd_api.c:162 #4 0x00403726 in glnd_main_process (mbx=0x626fb8) at glnd_api.c:242 #5 0x0040d240 in main (argc=2, argv=0x7fffd4cc90d8) at glnd_main.c:73 --- Sent from sourceforge.net because
[tickets] [opensaf:tickets] #872 osafdtmd asserts after connect with non member node
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#872] osafdtmd asserts after connect with non member node** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 23, 2014 06:45 AM UTC by Hans Feldt **Last Updated:** Wed Jul 15, 2015 01:28 PM UTC **Owner:** nobody 100% reproducible. By mistake I had opensaf started on my native system (named xubuntu-13 below). Then I launched a virtual cluster which then keeps crashing. SC-1 in the virtual cluster stays up but all other nodes keeps crashing with the following assert: Apr 23 08:35:27 SC-2 osafdtmd[352]: NO Established contact with 'xubuntu-13' Apr 23 08:35:27 SC-2 osafdtmd[352]: dtm_node.c:108: dtm_process_node_info: Assertion '0' failed. Apr 23 08:35:38 PL-3 osafdtmd[350]: NO Established contact with 'xubuntu-13' Apr 23 08:35:38 PL-3 osafdtmd[350]: NO Established contact with 'SC-2' Apr 23 08:35:38 PL-3 osafdtmd[350]: dtm_node.c:108: dtm_process_node_info: Assertion '0' failed. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #916 lock-in of payload middleware su times-out and remains in ENABLED state
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#916] lock-in of payload middleware su times-out and remains in ENABLED state** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue May 20, 2014 07:08 AM UTC by surender khetavath **Last Updated:** Wed Jul 15, 2015 01:25 PM UTC **Owner:** nobody **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/916/attachment/logs.tgz) (122.2 kB; application/x-compressed-tar) changeset : 5270 1) bring up 4 node cluster 2) lock and then lock-in the payload middleware su i.e "safSu=PL-4,safSg=NoRed,safApp=OpenSAF" console output amf-adm lock-in safSu=PL-4,safSg=NoRed,safApp=OpenSAF error - command timed out (alarm) syslog on pl-4 May 20 12:19:59 PL-4 osafamfnd[5473]: NO 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATED => TERMINATING May 20 12:19:59 PL-4 osaflcknd[5509]: NO Received AMF component terminate callback, exiting May 20 12:19:59 PL-4 osafckptnd[5518]: NO Received AMF component terminate callback, exiting May 20 12:19:59 PL-4 osafimmnd[5455]: NO Received AMF component terminate callback, exiting May 20 12:19:59 PL-4 osafamfnd[5473]: NO 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State TERMINATING => UNINSTANTIATED May 20 12:19:59 PL-4 osafsmfnd[5483]: NO Received AMF component terminate callback, exiting May 20 12:19:59 PL-4 osafmsgnd[5492]: ER Amf Terminate Callback called May 20 12:19:59 PL-4 osafmsgnd[5492]: NO Received AMF component terminate callback, exiting su state: safSu=PL-4,safSg=NoRed,safApp=OpenSAF saAmfSUAdminState=LOCKED-INSTANTIATION(3) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=UNINSTANTIATED(1) saAmfSUReadinessState=OUT-OF-SERVICE(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #889 unknown: oi poll timeout differs during switchover and failover scenarios
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#889] unknown: oi poll timeout differs during switchover and failover scenarios** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed May 07, 2014 03:10 PM UTC by surender khetavath **Last Updated:** Wed Jul 15, 2015 01:26 PM UTC **Owner:** nobody **Attachments:** - [failure_logs.tgz](https://sourceforge.net/p/opensaf/tickets/889/attachment/failure_logs.tgz) (747.3 kB; application/x-compressed-tar) - [success_logs.tgz](https://sourceforge.net/p/opensaf/tickets/889/attachment/success_logs.tgz) (739.4 kB; application/x-compressed-tar) changeset : 5143. test: 1) in a thread do oiInit() 2) oiImplSet() & OiObjectImplSet on an object 3) oiselectionObjectGet() 4) poll() on the fd In the main thread 1) om init, ownerset, 2) invoke controller failover/switchover 3) AdminOp(ONE_SECOND) on the object If the poll timeout value is 40secs, then OI doesn't receive AdminOp callback and poll timesout. If the poll timeout value is increased to say 80secs, then OI gets AdminOp callback. How does it differ? 1) is the imm operation held until failover is completed? 2) is the imm operation held until the failed node re-joins? 3) The time to receive cbk i.e more than 40secs is not acceptable for HA. The same test using swithover succeeds i.e receives cbk under 20secs of poll timeout. two versions of logs attached --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #511 SU stucks in Terminatting state.
- **Milestone**: 4.5.2 --> never --- ** [tickets:#511] SU stucks in Terminatting state.** **Status:** not-reproducible **Milestone:** never **Created:** Thu Jul 18, 2013 07:32 AM UTC by manu **Last Updated:** Tue Oct 06, 2015 11:06 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [messages.tar.bz2](https://sourceforge.net/p/opensaf/tickets/511/attachment/messages.tar.bz2) (3.1 kB; application/x-bzip) - [osafamfd.tar.bz2](https://sourceforge.net/p/opensaf/tickets/511/attachment/osafamfd.tar.bz2) (292.6 kB; application/x-bzip) - [osafamfnd.tar.bz2](https://sourceforge.net/p/opensaf/tickets/511/attachment/osafamfnd.tar.bz2) (209.7 kB; application/x-bzip) Change Set:4325 Configuration: 2N redundency model. 2SU, on SC-1 and PL-3, 1 SI Steps to Reproduce:- 1.Bring up the Application with CSI-CSI deps. Configuring CSI-CSI dependency with Multiple Sponsors and multiple dependents. 2.UNLOCK-IN / UNLOCK the SUs one by one. Both SUs are UNLOCKED/ENABLED/INSTANTIATED/IN-SERVICE. CSI Assignments happens perfectly. 3.Perform LOCK operation on SI. SI is LOCKED/UNASSIGNED 4.Perform Controller Switchover "amf-adm si-swap safSi=SC-2N,safApp=OpenSAF" Switchover happens perfectly. Both SUs are UNLOCKED/ENABLED/INSTANTIATED/IN-SERVICE. 5.Perform UNLOCK Operation ON SI . SI becomes UNLOCKED/PARTIALLY_ASSIGNED. 6.SU1 Stucks in TERMINATING state. safSu=d_2n_1,safSg=SG_d_2n,safApp=2nApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=DISABLED(2) saAmfSUPresenceState=TERMINATING(4) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=d_2n_2,safSg=SG_d_2n,safApp=2nApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #484 Improper implementer prefix: MsgQueueService131343
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#484] Improper implementer prefix: MsgQueueService131343** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Jul 05, 2013 08:04 AM UTC by Anders Bjornerstedt **Last Updated:** Wed Jul 15, 2015 02:02 PM UTC **Owner:** nobody The MQSv creates implementers with names like: MsgQueueService?131343 MsgQueueService?131599 MsgQueueService?131855 etc. Implementer-names, AdminOwner-names, Class-names and root object names in the imm service are all GLOBAL name spaces. This is an open-source project. It is therefore important that components that register themselves in global name spaces use proper prefixing. This is both to avoid name collisions but also to allow identification of the component during troubleshooting (done by others than the maintainer of that service). As far as I know, the message queue service is the only service currently violating this. An implementer name should have a prefix that eliminates the risk of naming conflicts and that makes it clear where it belongs. If the imm-implementer is part of a SAF standard service, then it should have the prefix "saf", like: safAmfService safSmfService safCheckPointService safLckService safEvtService safLogService safMsgGrpService If the implementer is part of an OpenSAF service that is not a standard SAF service then it should have a prefix like "OpenSAF" or osaf, like: OpenSAFDtsvService Because this was not caught and fixed early there is now an upgrade problem. But that should not be too hard to solve. At the place where the OI sets up its implementer-name and tries to set itself as class-implementer for the relevant classes. It should: 1) Allocate two OI handles and set implementername to the old bad name in one and to the new good name in the other. If it fails to set either implementer name with ERR_EXIST then it should behave the way it currently behaves when the implementer-name is occupied. 2) For each class it is to be class implementer for it does: Try to set class-implementer to the new good name (using the good handle). If this fails with ERR_EXIST then (i) try to set class-implementer to the old bad name (using the bad handle). This should succeed. (ii) clear the implementer-name using saImmOiClassImplementerRelease() this should succeed and is one of the rare occurrences where this api function is needed. (iii) Set implementer to the new good name (using the good handle). Repeat (2) for all classes used by the service. I am raising the priority to major (from previously being minor) because: - This is easy to fix. - The current setup looks bad and sets a bad example. - The current setup can cause confusion or uncertainty during troubleshooting or when just trying to understand the system. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #473 AMF: SU rank ordering not followed at adm op SG unlock inst
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#473] AMF: SU rank ordering not followed at adm op SG unlock inst** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Jun 24, 2013 12:32 PM UTC by Hans Feldt **Last Updated:** Thu Aug 06, 2015 09:53 AM UTC **Owner:** nobody >From spec 3.6.1.1: "Ordered list of service units for a service group: for each service group, an ordered list of service units defines the rank of the service unit within the service group. This rank is configured by setting the saAmfSURank attribute of the saAmfSU object class (see Section 8.10). The rank is represented by a positive integer. The lower the integer value, the higher the rank. The size of the list is equal to the number of service units configured for the service group. This ordered list is used to specify the order in which service units are selected to be instantiated." Instead all the SUs are instantiated in one go. See sg_app_sg_admin_unlock_inst() in avd_sg.cc --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #525 when instantiating IMMND, start_daemon is not staring the IMMND for the first time.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#525] when instantiating IMMND, start_daemon is not staring the IMMND for the first time.** **Status:** assigned **Milestone:** 4.6.2 **Created:** Fri Jul 26, 2013 02:49 PM UTC by Neelakanta Reddy **Last Updated:** Wed Jul 15, 2015 01:57 PM UTC **Owner:** Neelakanta Reddy **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/525/attachment/logs.tgz) (1.2 MB; application/x-compressed-tar) - [osafimmnd_SC1.bz2](https://sourceforge.net/p/opensaf/tickets/525/attachment/osafimmnd_SC1.bz2) (4.4 MB; application/x-bzip) Description: 2-controllers and 1 -payload with #501 fix 1. start the two controller SC-1 and SC-2, which is loaded from the PBE with 45K objects 2. start the payload 3. In the same time, issue the admin restart command at the active controller (SC-1) amf-adm restart safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF 4. There are logs added in imm clc-cli script, for instantiate() before and after start_daemon ul 26 17:30:43 Slot-3 osafamfnd[7167]: NO Admin restart requested for 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Jul 26 17:30:43 Slot-3 osafimmnd[7063]: NO Received AMF component terminate callback, exiting Jul 26 17:30:43 Slot-3 osafimmpbed: NO PBE received SIG_TERM, closing db handle Jul 26 17:30:43 Slot-3 osafimmpbed: IN IMM PBE process EXITING... Jul 26 17:30:43 Slot-3 osafimmnd: start /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:43 Slot-3 osafimmd[7048]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Jul 26 17:30:43 Slot-3 osafimmd[7048]: NO New coord elected, resides at 2020f Jul 26 17:30:43 Slot-3 osafimmnd: end /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:51 Slot-3 dhclient: DHCPREQUEST on eth0 to 10.176.108.18 port 67 Jul 26 17:30:53 Slot-3 osafamfd[7152]: NO Re-initializing with IMM Jul 26 17:30:53 Slot-3 osafamfnd[7167]: NO Instantiation of 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' failed Jul 26 17:30:53 Slot-3 osafamfnd[7167]: NO Reason: component registration timer expired Jul 26 17:30:53 Slot-3 osafimmnd: start /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:53 Slot-3 osafimmnd: end /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:53 Slot-3 osafimmnd[7592]: Started Jul 26 17:30:53 Slot-3 osafimmnd[7592]: NO Persistent Back-End capability configured, Pbe file:imm.db Jul 26 17:30:53 Slot-3 osafamfd[7152]: NO saImmOiAdminOperationResult for 30064771073 failed 9 Jul 26 17:30:53 Slot-3 osafimmd[7048]: NO New IMMND process is on ACTIVE Controller at 2010f Jul 26 17:30:53 Slot-3 osafimmnd[7592]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 26 17:30:53 Slot-3 osafimmnd[7592]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jul 26 17:30:53 Slot-3 osafimmnd[7592]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jul 26 17:30:53 Slot-3 osafimmd[7048]: WA IMMND on controller (not currently coord) requests sync Jul 26 17:30:53 Slot-3 osafimmnd[7592]: NO NODE STATE-> IMM_NODE_ISOLATED Jul 26 17:30:53 Slot-3 osafimmd[7048]: NO Node 2010f request sync sync-pid:7592 epoch:0 Jul 26 17:30:54 Slot-3 osafimmnd[7592]: NO NODE STATE-> IMM_NODE_W_AVAILABLE Jul 26 17:30:54 Slot-3 osafimmd[7048]: NO Successfully announced sync. New ruling epoch:13 Jul 26 17:30:54 Slot-3 osafimmnd[7592]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Jul 26 17:31:04 Slot-3 dhclient: DHCPREQUEST on eth0 to 10.176.108.18 port 67 Jul 26 17:31:12 Slot-3 osafimmd[7048]: NO ACT: New Epoch for IMMND process at node 2020f old epoch: 12 new epoch:13 Jul 26 17:31:13 Slot-3 osafimmd[7048]: NO ACT: New Epoch for IMMND process at node 2030f old epoch: 0 new epoch:13 Jul 26 17:31:13 Slot-3 osafimmnd[7592]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2171 5. when the above logs are analyzed, the amfnd is instantiating IMMND but the start_daemon for some reason is unable to start the immnd process. Jul 26 17:30:43 Slot-3 osafimmnd: start /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:43 Slot-3 osafimmd[7048]: WA IMMND coordinator at 2010f apparently crashed => electing new coord Jul 26 17:30:43 Slot-3 osafimmd[7048]: NO New coord elected, resides at 2020f Jul 26 17:30:43 Slot-3 osafimmnd: end /etc/opensaf/osafdir.conf, exiting. 6. After, 10 seconds when component registration timer expired then the amfnd tries to instantiate again and immnd got started. Jul 26 17:30:53 Slot-3 osafimmnd: start /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:53 Slot-3 osafimmnd: end /etc/opensaf/osafdir.conf, exiting. Jul 26 17:30:53 Slot-3 osafimmnd[7592]: Started 7. IMMND and IMMD traces has no logging when the amfnd tries to instantiate for first time. 8. when analyzed from IMM perspective: PL-3 sent the request for sync to SC-1 SC-1 IMMD sent SYN_REQ to both PL-3 and co-ordinator (to start sync) SC-1 immnd marked SyncRequested as true, but in same second the IMMND at SC-1
[tickets] [opensaf:tickets] #467 checkpoint with COLLOCATED flag forcing to register for arrival callback
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#467] checkpoint with COLLOCATED flag forcing to register for arrival callback** **Status:** assigned **Milestone:** 4.6.2 **Created:** Mon Jun 24, 2013 06:36 AM UTC by A V Mahesh (AVM) **Last Updated:** Tue Aug 11, 2015 06:16 AM UTC **Owner:** A V Mahesh (AVM) am using opensaf 4.0.0 http://devel.opensaf.org/ticket/1866 I am running a simple Amf demo for counting which uses checkpoint. my checkpoint creation flags are : SA_CKPT_CHECKPOINT_COLLOCATED| SA_CKPT_WR_ALL_REPLICAS i tested it on a 2 node cluster(both target hardware and UML nodes). problem is that unless i register for arrivalcallback, my standby component is faulting. amf is reporting healthcheck timeout. i tested for SA_CKPT_CHECKPOINT_COLLOCATED| SA_CKPT_WR_ACTIVE_REPLICA also . I am facing facing same issue. If I remove the collocated flag, it works fine. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #373 saAmfSGMaxActiveSIsperSU is not followed in the case of csiSetCallbackFailed scenarion in NWAY
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#373] saAmfSGMaxActiveSIsperSU is not followed in the case of csiSetCallbackFailed scenarion in NWAY** **Status:** accepted **Milestone:** 4.6.2 **Created:** Fri May 31, 2013 03:55 AM UTC by Nagendra Kumar **Last Updated:** Mon Apr 20, 2015 06:30 AM UTC **Owner:** Praveen **Attachments:** - [AppConfig-NWAY_1Spare.xml_maxSI](https://sourceforge.net/p/opensaf/tickets/373/attachment/AppConfig-NWAY_1Spare.xml_maxSI) (19.0 kB; text/xml) - [373.tgz](https://sourceforge.net/p/opensaf/tickets/373/attachment/373.tgz) (407.8 kB; application/x-compressed) Migrated from http://devel.opensaf.org/ticket/2362 Redundancy Model : NWAY Change set : 3049. Virtual machine 1) With the xml attached, brought up the configuration. 2) Performed lock,lock-in,unlock-in and unlock of the SG 3) SU2 hosted on SC-2 got one active assignment and one standby CSI assignment. SU3 hosted on PL-3 got one active assignment and one standby CSI assignment. 4) Performed admin shutdown on SU3 5) component in SU3 hosted on PL-3 faulted in quiescing callback went for reboot. Nov 24 16:18:01 SLES11-CONN-PC osafamfnd[5776]: 'safComp=AmfDemo?9,safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?' faulted due to 'csiSetcallbackFailed(12)' : Recovery is 'nodeFailover(5)' Nov 24 16:18:01 SLES11-CONN-PC osafamfnd[5776]: 'safSu=SU3,safSg=AmfDemo?,safApp=AmfDemo?' Presence State INSTANTIATED => TERMINATING 6) Already SU2 got one active assignment. As part of reassignment of SI's hosted on SU3, SU2 got more active assignmennt which should not happen. As maxActiveSIsPerSU is only 1, this assignment should not happen. SLES11-SLOT-1:/home/xml # /etc/init.d/opensafd status safSISU=safSu=SU2\,safSg=AmfDemo?\,safApp=AmfDemo?,safSi=AmfDemo?2,safApp=AmfDemo? saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SU2\,safSg=AmfDemo?\,safApp=AmfDemo?,safSi=AmfDemo?,safApp=AmfDemo? saAmfSISUHAState=ACTIVE(1) Changed 18 months ago by srikanth ■attachment AppConfig-NWAY_1Spare.xml_maxSI added Changed 18 months ago by srikanth ¶ immdump for SG after PL-3 reboot : SLES11-SLOT-1:/home/xml # immlist safSg=AmfDemo?,safApp=AmfDemo? Name Type Value(s) safSg SA_STRING_T safSg=AmfDemo? saAmfSGType SA_NAME_T safVersion=4.0.0,safSgType=AmfDemo? (34) saAmfSGSuRestartProb SA_TIME_T saAmfSGSuRestartMax SA_UINT32_T saAmfSGSuHostNodeGroup SA_NAME_T safAmfNodeGroup=SCs,safAmfCluster=myAmfCluster (46) saAmfSGNumPrefStandbySUs SA_UINT32_T 2 (0x2) saAmfSGNumPrefInserviceSUs SA_UINT32_T 3 (0x3) saAmfSGNumPrefAssignedSUs SA_UINT32_T 3 (0x3) saAmfSGNumPrefActiveSUs SA_UINT32_T 3 (0x3) saAmfSGNumCurrNonInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrInstantiatedSpareSUs SA_UINT32_T 0 (0x0) saAmfSGNumCurrAssignedSUs SA_UINT32_T 2 (0x2) saAmfSGMaxStandbySIsperSU SA_UINT32_T 2 (0x2) saAmfSGMaxActiveSIsperSU SA_UINT32_T 1 (0x1) saAmfSGCompRestartProb SA_TIME_T saAmfSGCompRestartMax SA_UINT32_T saAmfSGAutoRepair SA_UINT32_T 0 (0x0) saAmfSGAutoAdjustProb SA_TIME_T saAmfSGAutoAdjust SA_UINT32_T 0 (0x0) saAmfSGAdminState SA_UINT32_T 1 (0x1) SaImmAttrImplementerName? SA_STRING_T safAmfService SaImmAttrClassName? SA_STRING_T SaAmfSG SaImmAttrAdminOwnerName? SA_STRING_T Changed 18 months ago by ravisekhar ¶ ■status changed from new to accepted Changed 13 months ago by hafe ¶ I see nothing happening with this ticket although in accepted state for months. If status is not updated in short, I will change the milestone to "future" end of this week. Changed 13 months ago by ravisekhar ¶ ■milestone changed from 4.2.1 to future_releases --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #392 payload node stuck in locked state.
- **Milestone**: 4.5.2 --> never --- ** [tickets:#392] payload node stuck in locked state.** **Status:** invalid **Milestone:** never **Created:** Fri May 31, 2013 05:10 AM UTC by Nagendra Kumar **Last Updated:** Mon Oct 05, 2015 11:53 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [logs.tar](https://sourceforge.net/p/opensaf/tickets/392/attachment/logs.tar) (34.6 kB; application/x-gzip-compressed) - [AppConfig-npm_392.xml](https://sourceforge.net/p/opensaf/tickets/392/attachment/AppConfig-npm_392.xml) (23.1 kB; text/xml) Migrated from http://devel.opensaf.org/ticket/2578 Model : NPM ( 3+2) changeset : 3406 Configuration : 1App,1Sg,6sis,8sus,8comps,6csis SUs 1-6 are mapped to PL-4 and SU7-8 are mapped to PL-3 Scenario: Bring up the model, unlock-in and unlock the SUs. The initial assignments are as Active standby SI1 —-> SU1 Su4 SI2——> SU1 SU4 SI3——> SU2 SU4 SI4——> SU2 SU4 SI5——> SU3 SU5 SI6——> SU3 SU5 Now lock all the SIs except SI6.Then lock all the SUs except SU3. Unlock SI1.Now the assignments are as Active Standby SI1——-> SU6 SI6——-> SU6 Now lock the PL-4 as "amf-adm lock safAmfNode=PL-4,safAmfCluster=myAmfCluster" Now unlocking the PL-4 doesn't unlock it. SI states: safSi=SC-2N,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed?1,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed?2,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed?3,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=NoRed?4,safApp=OpenSAF saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=FULLY_ASSIGNED(2) safSi=dummy_NplusM_1Norm_1,safApp=NpMApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) safSi=dummy_NplusM_1Norm_2,safApp=NpMApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=dummy_NplusM_1Norm_3,safApp=NpMApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=dummy_NplusM_1Norm_4,safApp=NpMApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=dummy_NplusM_1Norm_5,safApp=NpMApp saAmfSIAdminState=LOCKED(2) saAmfSIAssignmentState=UNASSIGNED(1) safSi=dummy_NplusM_1Norm_6,safApp=NpMApp saAmfSIAdminState=UNLOCKED(1) saAmfSIAssignmentState=UNASSIGNED(1) SU states : safSu=PL-3,safSg=NoRed?,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=PL-4,safSg=NoRed?,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SC-1,safSg=2N,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SC-1,safSg=NoRed?,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SC-2,safSg=2N,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SC-2,safSg=NoRed?,safApp=OpenSAF saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=dummy_NplusM_1Norm_1,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_2,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_3,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_4,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_5,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_6,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=OUT-OF-SERVICE(1) safSu=dummy_NplusM_1Norm_7,safSg=SG_dummy_npm,safApp=NpMApp saAmfSUAdminState=LOCKED(2)
[tickets] [opensaf:tickets] #399 amf: SU admin state not updated after doing controller switchover and admin lock of SU.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#399] amf: SU admin state not updated after doing controller switchover and admin lock of SU.** **Status:** assigned **Milestone:** 4.6.2 **Created:** Fri May 31, 2013 05:21 AM UTC by Praveen **Last Updated:** Wed Aug 12, 2015 09:11 AM UTC **Owner:** Nagendra Kumar Migrated from http://devel.opensaf.org/ticket/2879. changeset : 3796, 4.2.2 model : NpluM Initial Configuration:- = SI equal distribution saAmfSGNumPrefInserviceSUs=5 -a saAmfSGMaxActiveSIsperSU=2 -a saAmfSGMaxStandbySIsperSU=3 -a saAmfSGNumPrefActiveSUs=3 -a saAmfSGNumPrefStandbySUs=2 saAmfSGAutoAdjust=1 6 SIs in locked state. saAmfSIPrefActiveAssignments=1 -a saAmfSIPrefStandbyAssignments=1 5SUs with same SURank set to 5.Each SUs admin state was locked-instantiation state. SU1, SU4, SU5 spawned on SC-1 SU2 on SC-2 SU3 on PL-4 Steps:- 1. Brought up the NplusM model with above configuration. 2. Performed unlock-instantiation operation on each SUs (SU1 to SU5) 3. Performed unlock operation on each SUs (SU1 to SU5). 4. Performed unlock of each SIs (SI1 to SI6) Here observed that SUSI assignments were equally distributed. 5. Now on SC-1, command line trigger controller switchover and immediately on SC-2, trigger the admin lock on SU1. Here observed that controller switchover successfully completed but the admin lock on SU1 failed with SA_AIS_ERR_TIMEOUT. Again tried to lock the SU1, but this time it got failed with SA_AIS_ERR_NO_OP. It was failing with the same error SA_AIS_ERR_NO_OP after reties. amf-state su states was showing the admin state of SU1 as UNLOCKED. Hence admin state of SU1 was not getting changed. Observed that all the SUSI assignments from SU1 got removed but the /var/log/messages was printing the below messages:- Oct 23 13:01:53 SLOT2 osafimmnd[7176]: Timeout on syncronous admin operation 1 Oct 23 13:03:47 SLOT2 osafamfd[7225]: Admin operation (2) has no effect on current state (2) Oct 23 13:06:15 SLOT2 osafamfd[7225]: Admin operation (2) has no effect on current state (2) safSu=d_NplusM_1Norm_1,safSg=SG_d_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=d_NplusM_1Norm_2,safSg=SG_d_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=d_NplusM_1Norm_3,safSg=SG_d_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=d_NplusM_1Norm_4,safSg=SG_d_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=d_NplusM_1Norm_5,safSg=SG_d_npm,safApp=NpMApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSISU=safSu=SC-1\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?2,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=STANDBY(2) safSISU=safSu=SC-2\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=PL-3\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?4,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=PL-4\,safSg=NoRed?\,safApp=OpenSAF,safSi=NoRed?3,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_6,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_2\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp saAmfSISUHAState=STANDBY(2) safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_1,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_2,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_5\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_4,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_4\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_3,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) safSISU=safSu=d_NplusM_1Norm_3\,safSg=SG_d_npm\,safApp=NpMApp,safSi=d_NplusM_1Norm_5,safApp=NpMApp saAmfSISUHAState=ACTIVE(1) Changed 7 months ago by shareef Same issue also observed with
[tickets] [opensaf:tickets] #314 AMF looses alarms and notifications during switch-over
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#314] AMF looses alarms and notifications during switch-over** **Status:** accepted **Milestone:** 4.6.2 **Created:** Fri May 24, 2013 08:34 AM UTC by Nagendra Kumar **Last Updated:** Mon Apr 20, 2015 06:42 AM UTC **Owner:** Praveen **Attachments:** - [osafamfd](https://sourceforge.net/p/opensaf/tickets/314/attachment/osafamfd) (5.7 MB; application/octet-stream) - [messages](https://sourceforge.net/p/opensaf/tickets/314/attachment/messages) (41.9 kB; application/octet-stream) Migrated from http://devel.opensaf.org/ticket/3051 Background: http://devel.opensaf.org/ticket/3028 If another node (payload) leaves the cluster in the middle of switch-over, amfd logs this: Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendStateChangeNotificationAvd: saNtfNotificationSend Failed (6) Mar 8 10:18:21 SC-1 osafamfd[304]: ER sendAlarmNotificationAvd: saNtfNotificationSend Failed (6) These logs means that amfd failed to send an alarm and a notification due to TRYAGAIN returned from NTF (in NOACTIVE state) AMF needs to store the alarms/notifications produced in the NOACTIVE state and send them at the end of the switch-over. Or with using a separate thread that can block forever (?) on TRYAGAIN. The problem exist in all opensaf releases --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1331 java clm : dispatchBlocking() APIs does not return SA_AIS_OK after finalizing the handle
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1331] java clm : dispatchBlocking() APIs does not return SA_AIS_OK after finalizing the handle** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Apr 22, 2015 09:06 AM UTC by Sirisha Alla **Last Updated:** Wed Apr 22, 2015 09:06 AM UTC **Owner:** nobody This ticket is clone of devel ticket 1671. When dispatchBlocking() and dispatchBlocking(tmout) apis are invoked in a thread and the handle is finalized, SA_AIS_OK should be returned by the dispatchBlocking APIs. Instead an exception is being raised. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1325 Unnecessary state change notification regarding osafntfimcnd during failover
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1325] Unnecessary state change notification regarding osafntfimcnd during failover** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue Apr 21, 2015 01:09 PM UTC by Srikanth R **Last Updated:** Tue Apr 21, 2015 01:09 PM UTC **Owner:** nobody Changeset : 6377 Unnecessary and invalid notification is generated during the failover. === Apr 9 21:14:03 - State Change === eventType = SA_NTF_OBJECT_STATE_CHANGE notificationObject = "osafntfimcnd" notifyingObject = "safApp=OpenSaf" notificationClassId = 32993.8.0 (0x0) sourceIndicator = SA_NTF_OBJECT_OPERATION --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1320 configure_tipc script error due to $CORE_ID
- **Milestone**: 4.4.2 --> 4.6.2 --- ** [tickets:#1320] configure_tipc script error due to $CORE_ID** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Mon Apr 20, 2015 09:34 PM UTC by Adrian Szwej **Last Updated:** Mon Apr 20, 2015 09:34 PM UTC **Owner:** nobody During start of opensaf, the /var/log/opensaf/nid.log gives an error message: **/usr/local/lib/opensaf/configure_tipc: line 198: [: 1234: unary operator expected** The script **/usr/local/lib/opensaf/configure_tipc** contains $CORE_ID parameter which does not seem to be set anywhere. configured_net_id=`tipc-config -netid | cut -d: -f2` opensaf_net_id=$CORE_ID if [ $configured_net_id != $opensaf_net_id ]; then logger -t opensaf -s "TIPC network ID not configured to OpenSAF requirements, exiting..." exit 1 fi --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #945 AMF: allow creation of unlocked SUs when node is locked-inst
- **Milestone**: 4.5.FC --> never --- ** [tickets:#945] AMF: allow creation of unlocked SUs when node is locked-inst** **Status:** invalid **Milestone:** never **Created:** Mon Jun 23, 2014 02:05 PM UTC by Hans Feldt **Last Updated:** Mon Jun 30, 2014 07:43 AM UTC **Owner:** nobody A small change but important for the "cluster scale out" use case. A pre-requisite is that the service unit is mapped to a node (not node group) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1556 AMF : SU struck in instantiating state during adm su restart op ( component reg failure )
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1556] AMF : SU struck in instantiating state during adm su restart op ( component reg failure )** **Status:** accepted **Milestone:** 4.6.2 **Created:** Thu Oct 22, 2015 06:44 AM UTC by Srikanth R **Last Updated:** Thu Oct 22, 2015 09:42 AM UTC **Owner:** Praveen **Attachments:** - [TwoN.sh](https://sourceforge.net/p/opensaf/tickets/1556/attachment/TwoN.sh) (9.7 kB; application/x-shellscript) Changeset : 6901 Application : 2N , two SUs steps : * Both the SUs are having full assignments. * Issued restart operation on SU hosting standby assignment. The first component in the SU did not register with AMF. Only the CLC CLI script exited with success, but saAmfComponentRegister is not called by component . Oct 17 13:40:37 SYSTEST-PLD-1 osafamfnd[28402]: NO Admin Restart request for 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Oct 17 13:40:37 SYSTEST-PLD-1 osafamfnd[28402]: NO 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State INSTANTIATED => RESTARTING Oct 17 13:40:37 SYSTEST-PLD-1 osafamfnd[28402]: NO 'safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' Presence State RESTARTING => INSTANTIATING Oct 17 13:40:47 SYSTEST-PLD-1 osafamfnd[28402]: NO Instantiation of 'safComp=COMP1,safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN' failed Oct 17 13:40:47 SYSTEST-PLD-1 osafamfnd[28402]: NO Reason: component registration timer expired Below is the state of SU after the admin operation timed out. safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATING(2) saAmfSUReadinessState=OUT-OF-SERVICE(1) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1557 Comp fails in INSTANTIATION_FAILED because comp crashes after compRegistration timeout
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1557] Comp fails in INSTANTIATION_FAILED because comp crashes after compRegistration timeout** **Status:** review **Milestone:** 4.6.2 **Labels:** INSTANTIATION_FAILED component registration **Created:** Fri Oct 23, 2015 02:17 AM UTC by Minh Hon Chau **Last Updated:** Wed Oct 28, 2015 02:46 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [app3_twon2su1si.xml](https://sourceforge.net/p/opensaf/tickets/1557/attachment/app3_twon2su1si.xml) (10.5 kB; text/xml) - [amf_demo_script](https://sourceforge.net/p/opensaf/tickets/1557/attachment/amf_demo_script) (1.9 kB; application/octet-stream) - [log.tgz](https://sourceforge.net/p/opensaf/tickets/1557/attachment/log.tgz) (698.3 kB; application/x-compressed-tar) - [amf_demo.diff](https://sourceforge.net/p/opensaf/tickets/1557/attachment/amf_demo.diff) (2.0 kB; text/x-patch) Steps reproduce: . Apply amf_demo.diff and build amf_demo, using attached amf_demo_script as clc script . Run commands: . immcfg -f app3_twon2su1si.xml . echo 1 > /root/hu23992 . amf-adm unlock-in safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon Logs: Oct 23 12:47:19 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State UNINSTANTIATED => INSTANTIATING Oct 23 12:47:19 PL-4 amf_demo_script: CLC-START: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon Oct 23 12:47:22 PL-4 amf_demo[585]: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started Oct 23 12:47:26 PL-4 osafamfnd[421]: NO Instantiation of 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' failed Oct 23 12:47:26 PL-4 osafamfnd[421]: NO Reason: component registration timer expired Oct 23 12:47:26 PL-4 amf_demo_script: CLC-STOP: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon Oct 23 12:47:27 PL-4 amf_demo[585]: Registered with AMF and HC started Oct 23 12:47:27 PL-4 amf_demo[585]: Health check 1 Oct 23 12:47:29 PL-4 amf_demo[585]: exiting (caught term signal) Oct 23 12:47:29 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' component restart probation timer started (timeout: 100 ns) Oct 23 12:47:29 PL-4 osafamfnd[421]: NO Restarting a component of 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' (comp restart count: 1) Oct 23 12:47:29 PL-4 osafamfnd[421]: NO 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' faulted due to 'avaDown' : Recovery is 'componentRestart' Oct 23 12:47:29 PL-4 amf_demo_script: CLC-STOP: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon Oct 23 12:47:29 PL-4 amf_demo_script: CLC-START: safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon Oct 23 12:47:32 PL-4 amf_demo[628]: 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' started Oct 23 12:47:32 PL-4 amf_demo[628]: exiting (caught term signal) Oct 23 12:47:32 PL-4 osafamfnd[421]: WA 'safComp=AmfDemo,safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATING => INSTANTIATION_FAILED Oct 23 12:47:32 PL-4 osafamfnd[421]: NO 'safSu=SU4,safSg=AmfDemoTwon,safApp=AmfDemoTwon' Presence State INSTANTIATING => INSTANTIATION_FAILED Trace is also attached. Initial analysis: . After comp timeout in component_registration phase, amfnd enters instantiating_fail, thus cleanup clc is called . Then comp crashed, amfnd receives ava_mds_down, amfnd also enters instantiating_fail for component, another cleanup clc is called. . Eventually, at the returns of two cleanup clc, amfnd will enters cleanup_success twice under instantiating state of component . At the second cleanup_success, the retry_counter has reach retry_max, so component fails into INSTANTIATION_FAILED As first thought, amfnd should not enter instantiating_fail when comp is crashed, since it has been already in handling of instantiating_fail. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1562 AMF : (NPM ) Standby assignments are done with out any active assignment
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1562] AMF : (NPM ) Standby assignments are done with out any active assignment** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Oct 23, 2015 01:59 PM UTC by Srikanth R **Last Updated:** Fri Oct 23, 2015 01:59 PM UTC **Owner:** nobody **Attachments:** - [1562.tgz](https://sourceforge.net/p/opensaf/tickets/1562/attachment/1562.tgz) (178.3 kB; application/x-compressed-tar) Changeset : 6901 Setup : NPM application with 4 SUs hosted on PL-3 & PL-4 and 4SIs SU1 & SU3 hosted on PL-3 , SU2 & SU4 hosted on PL-4 Steps : After a series of operation on the NPM application, below are the state of assignments | TestApp_SI1 | TestApp_SI2 | TestApp_SI3 | TestApp_SI4 TestApp_SU1|ACTIVE |ACTIVE | | TestApp_SU2| | | ACTIVE |ACTIVE TestApp_SU3|STANDBY |STANDBY|STANDBY | TestApp_SU4| | | |STANDBY After opensafd is stopped on PL-3, below are the assignments TestApp_SI1 TestApp_SI2 TestApp_SI3 TestApp_SI4 TestApp_SU1 TestApp_SU2 ACTIVE ACTIVE TestApp_SU3 TestApp_SU4STANDBY STANDBY STANDBY Corresponding log in syslog on PL-4 : Oct 23 19:00:29 PAYLOAD-2 osafimmnd[8101]: NO Implementer disconnected 40 <0, 2010f> (MsgQueueService131855) Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigning 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI2,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:29 PAYLOAD-2 osafamfnd[8120]: NO Assigned 'safSi=TestApp_SI1,safApp=TestApp_Npm' STANDBY to 'safSu=TestApp_SU4,safSg=TestApp_SG1,safApp=TestApp_Npm' Oct 23 19:00:32 PAYLOAD-2 kernel: [ 7785.128227] TIPC: Resetting link <1.1.4:eth3-1.1.3:eth3>, peer not responding Attached is amfd.state and amfd traces on active controller, amfnd trace on payload hosting SU2 & SU4 and also the NPM configuration. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1560 AMF : NG admin state should be validated during creation
- **Milestone**: 4.6.1 --> 4.6.2 --- ** [tickets:#1560] AMF : NG admin state should be validated during creation** **Status:** review **Milestone:** 4.6.2 **Created:** Fri Oct 23, 2015 07:20 AM UTC by Srikanth R **Last Updated:** Thu Oct 29, 2015 05:22 AM UTC **Owner:** Praveen Changeset : 6901 While creating node group, the admin state value should be validated. Currently, invalid admin state for the node group is accepted immcfg -c SaAmfNodeGroup safAmfNodeGroup=TestNG,safAmfCluster=myAmfCluster -a saAmfNGNodeList=safAmfNode=SC-1,safAmfCluster=myAmfCluster -a saAmfNGAdminState=5 CONTROLLER-1:~ # amf-state ng safAmfNodeGroup=TestNG,safAmfCluster=myAmfCluster saAmfNGAdminState=UNKNOWN(5) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1529 Node rebooted as saImmOiInitialize_2 failed during middleware active assignment
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1529] Node rebooted as saImmOiInitialize_2 failed during middleware active assignment** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Oct 08, 2015 07:53 AM UTC by Chani Srivastava **Last Updated:** Fri Oct 09, 2015 10:54 AM UTC **Owner:** nobody **Attachments:** - [SC1_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC1_syslog.txt) (436.4 kB; text/plain) - [SC2_syslog.txt](https://sourceforge.net/p/opensaf/tickets/1529/attachment/SC2_syslog.txt) (425.6 kB; text/plain) - [1529.tgz](https://sourceforge.net/p/opensaf/tickets/1529/attachment/1529.tgz) (586.3 kB; application/x-compressed-tar) Setup: Changeset-6901 Invoked continuous failovers on a 4-node Cluster with 2 controllers and 2 payloads. All nodes have 64bit architecture. 2PBE enabled with 25K objects Issue Observed: Cluster reset occurred on invoking continuous failovers Attachments: Attaching syslogs for SC-1 and SC-2 Traces for immnd and immd can be shared seperately if required Steps: * Initially SC-1 is active and SC-2 standby * A test script invoked failover via killing osafclmd on SC1 * SC-2 became active Oct 7 18:23:32 OSAF-SC1 root: killing osafclmd from invoke_failover.sh Oct 7 19:25:20 OSAF-SC2 osafamfd[2191]: NO FAILOVER StandBy --> Active * On the new active controler, saImmOiInitialize_2 failed Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5) Oct 7 19:25:22 OSAF-SC2 osafntfimcnd[2735]: ER ntfimcn_imm_init() Fail Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 333 (safLckService) <299, 2020f> Oct 7 19:25:22 OSAF-SC2 osafimmnd[2131]: NO Implementer connected: 334 (safEvtService) <298, 2020f> Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5) Oct 7 19:25:23 OSAF-SC2 osafntfimcnd[2738]: ER ntfimcn_imm_init() Fail Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA MDS Send Failed Oct 7 19:25:23 OSAF-SC2 osafimmnd[2131]: WA Error code 2 returned for message type 4 - ignoring * Other services also fail to initialize with IMM on new active controller..i.e. SC-2 * And finally SMF had csi set timeout * SC-2 went for reboot and hence the entire cluster reset, as SC-2 is the only active controller at the time Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: NO 'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackTimeout' : Recovery is 'nodeFailfast' Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: ER safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackTimeout Recovery is:nodeFailfast Oct 7 19:25:51 OSAF-SC2 osafamfnd[2205]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Oct 7 19:25:51 OSAF-SC2 opensaf_reboot: Rebooting local node; timeout=60 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1465 Don't send alarm "SI has no current active assignments" if node is locked
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1465] Don't send alarm "SI has no current active assignments" if node is locked** **Status:** review **Milestone:** 4.6.2 **Created:** Fri Aug 28, 2015 03:01 PM UTC by hano **Last Updated:** Tue Sep 15, 2015 01:24 PM UTC **Owner:** hano In a cloud environment, scale in is done with node shutdown, lock and opensafd stop. Considering M/W No Redundancy SI, an alarm 'SI Unassigned' is raised when performing opensafd stop as M/W SI assignments are not affected by the node lock/shutdwon. This alarm is to be avoided. A patch is sent out were if the node is shutdown/locked and the redundancy model is no-red, the alarm will not be sent. The alarm is also not wanted for No Redundancy application SIs at node shutdown, a version 3 of the patch is sent out. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1400 systemd problems installing on debian jessie
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1400] systemd problems installing on debian jessie** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Thu Jul 02, 2015 08:23 PM UTC by Charles Stuart Johnson **Last Updated:** Wed Jul 15, 2015 12:52 PM UTC **Owner:** nobody After installing all the requested packages through successive installs of OpenSAF on debian 8.0.0 and 8.1.0, got block by a bug when using this command: cd /data/projects/opensaf/opensaf-staging && ./bootstrap.sh && ./configure --disable-tipc --disable-ais-plm --enable-java && make && sudo make install Here's what I got: sh: 6: qmake-qt4: not found autoreconf: Entering directory `.' autoreconf: configure.ac: not using Gettext autoreconf: running: aclocal -I m4 autoreconf: configure.ac: tracing autoreconf: configure.ac: adding subdirectory contrib/plmc to autoreconf autoreconf: Entering directory `contrib/plmc' autoreconf: running: libtoolize --copy libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `.'. libtoolize: copying file `./ltmain.sh' libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'. libtoolize: copying file `m4/libtool.m4' libtoolize: copying file `m4/ltoptions.m4' libtoolize: copying file `m4/ltsugar.m4' libtoolize: copying file `m4/ltversion.m4' libtoolize: copying file `m4/lt~obsolete.m4' autoreconf: running: /usr/bin/autoconf autoreconf: running: /usr/bin/autoheader autoreconf: running: automake --add-missing --copy --no-force configure.ac:39: installing './compile' configure.ac:20: installing './config.guess' configure.ac:20: installing './config.sub' configure.ac:25: installing './install-sh' configure.ac:25: installing './missing' lib/utils/Makefile.am: installing './depcomp' autoreconf: Leaving directory `contrib/plmc' libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `.'. libtoolize: copying file `./ltmain.sh' libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'. libtoolize: copying file `m4/libtool.m4' libtoolize: copying file `m4/ltoptions.m4' libtoolize: copying file `m4/ltsugar.m4' libtoolize: copying file `m4/ltversion.m4' libtoolize: copying file `m4/lt~obsolete.m4' configure.ac:27: installing './compile' configure.ac:20: installing './config.guess' configure.ac:20: installing './config.sub' configure.ac:25: installing './install-sh' configure.ac:25: installing './missing' java/ais_api_impl_native/Makefile.am: installing './depcomp' python/pyosaf/Makefile.am:21: installing './py-compile' autoreconf: Leaving directory `.' abort: no repository found in '/data/projects/opensaf/opensaf-staging' (.hg not found)! checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking target system type... x86_64-unknown-linux-gnu checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking for style of include used by make... GNU checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking dependency style of gcc... gcc3 checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking minix/config.h usability... no checking minix/config.h presence... no checking for minix/config.h... no checking whether it is safe to define __EXTENSIONS__... yes checking whether to build with rpath enabled... yes checking how to print strings... printf checking for a sed that does not truncate output... /bin/sed checking for fgrep... /bin/grep -F checking for ld used by gcc... /usr/bin/ld checking if the linker (/usr/bin/ld) is GNU ld... yes checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B checking the name lister (/usr/bin/nm -B) interface... BSD nm checking whether ln -s works... yes checking the maximum length of command line arguments... 1572864 checking whether the shell understands some XSI constructs... yes checking whether the shell understands "+="... yes checking how to
[tickets] [opensaf:tickets] #1421 log: not check special characters from saLogStreamFileName value
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1421] log: not check special characters from saLogStreamFileName value** **Status:** accepted **Milestone:** 4.6.2 **Created:** Thu Jul 16, 2015 09:35 AM UTC by Vu Minh Nguyen **Last Updated:** Mon Aug 03, 2015 11:25 AM UTC **Owner:** Vu Minh Nguyen Logsv does not validate if the saLogStreamFileName value has sepcial characters or not. See at line of saLogStreamFileName attribute. > # immlist safLgStrCfg=str9,safApp=safLogService Name Type Value(s) safLgStrCfgSA_STRING_T safLgStrCfg=str9 saLogStreamSeverityFilter SA_UINT32_T 30 (0x1e) saLogStreamPathNameSA_STRING_T . saLogStreamNumOpeners SA_UINT32_T 1 (0x1) saLogStreamMaxLogFileSize SA_UINT64_T 500 (0x4c4b40) saLogStreamMaxFilesRotated SA_UINT32_T 4 (0x4) saLogStreamLogFullHaltThresholdSA_UINT32_T 75 (0x4b) saLogStreamLogFullAction SA_UINT32_T 3 (0x3) saLogStreamLogFileFormat SA_STRING_T saLogStreamFixedLogRecordSize SA_UINT32_T 150 (0x96) saLogStreamFileNameSA_STRING_T \/ a bc . txt saLogStreamCreationTimestamp SA_TIME_T 1437031872934978000 (0x13f15cdfed3421d0, Thu Jul 16 08:31:12 2015) logStreamDiscardedCounter SA_UINT64_T 0 (0x0) SaImmAttrImplementerName SA_STRING_T safLogService SaImmAttrClassName SA_STRING_T SaLogStreamConfig SaImmAttrAdminOwnerNameSA_STRING_T As the result, logsv gets failed to create cfg/log files. In trace log, we get following err message: > Jul 16 8:31:13.068090 osaflogd [417:lgs_util.c:0106] TR > lgs_create_config_file_h - Config file path "/repl_opensaf/saflog/./\/ a bc > . txt.cfg" Jul 16 8:31:13.068997 osaflogd [417:lgs_filehdl.c:0170] >> create_config_file_hdl Jul 16 8:31:13.069422 osaflogd [417:lgs_filehdl.c:0172] TR create_config_file_hdl - file_path "/repl_opensaf/saflog/./\/ a bc . txt.cfg" Jul 16 8:31:13.074774 osaflogd [417:lgs_filehdl.c:0182] NO Could not open '/repl_opensaf/saflog/./\/ a bc . txt.cfg' - No such file or directory Jul 16 8:31:13.075243 osaflogd [417:lgs_filehdl.c:0232] << create_config_file_hdl: rc = -1 Jul 16 8:31:13.075975 osaflogd [417:lgs_util.c:0166] << lgs_create_config_file_h: rc = -1 Jul 16 8:31:13.079080 osaflogd [417:lgs_stream.c:0347] TR log_initiate_stream_files - lgs_create_config_file_h() FAIL --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1410 pyosaf: Invalid exception used in ImmObject (object.py)
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#1410] pyosaf: Invalid exception used in ImmObject (object.py)** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Jul 10, 2015 10:11 AM UTC by Johan Mårtensson **Last Updated:** Wed Jul 15, 2015 12:46 PM UTC **Owner:** nobody ImmObject uses an invalid way to raise exceptions: >>> a = ImmObject('NonExistingClass') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/pyosaf/utils/immom/object.py", line 63, in __init__ raise TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #928 base: Selection object fails due to re-cycled file descriptor
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#928] base: Selection object fails due to re-cycled file descriptor** **Status:** accepted **Milestone:** 4.6.2 **Created:** Wed May 28, 2014 07:48 AM UTC by Anders Widell **Last Updated:** Wed Jul 15, 2015 01:25 PM UTC **Owner:** nobody A case has been seen where syslog gets filled with thousands of messages like the one below: May 3 15:37:48 SC-1 osaflogd[7643]: ncs_sel_obj_rmv_ind: recv failed - Socket operation on non-socket Probably the wrong file descriptor is being used here when this happens. When looking at the code, there are some obvious improvements that can be made: * Whenever the file descriptors raise_obj and/or rmv_obj are closed, the file descriptors in the data structure should be overwritten with -1 to indicate that the file descriptor is no longer valid. Relying on subsequent system calls to fail with EBADF is not a good idea, since the file descriptor may be re-cycled. This might be what has happened in the syslog entry above. * The function ncs_sel_obj_rmv_ind() should check if either file descriptor is less than zero, and if so, return immediately without trying to operate on the file descriptors. It may log to syslog in this case, but in order to avoid spamming the log it should make sure to log only once. This can be achieved by e.g. logging if the file descriptor is -1, and then change it to -2 so that the next call will not log to syslog. * If, after implementing the changes suggested above, recv() still fails due to any other reason than EAGAIN, EWOULDBLOCK or EINTR, we should call osaf_abort() to generate a core dump. Errors like "socket operation on non-socket" is an indication of a bug. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #682 LOG: New Active reboots when coordinator IMMND is killed in the middle of switchover
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#682] LOG: New Active reboots when coordinator IMMND is killed in the middle of switchover** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Dec 20, 2013 05:27 AM UTC by Sirisha Alla **Last Updated:** Mon Aug 03, 2015 11:31 AM UTC **Owner:** nobody **Attachments:** - [logs.tar.bz2](https://sourceforge.net/p/opensaf/tickets/682/attachment/logs.tar.bz2) (4.3 MB; application/x-bzip) - [tic682.tgz](https://sourceforge.net/p/opensaf/tickets/682/attachment/tic682.tgz) (208.3 kB; application/x-compressed-tar) The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741 and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE enabled loaded with 25k objects. SC-2(SLES-64BIT-SLOT2) is Active and IMMND coordinator is hosted on SC-1(SLES-64BIT-SLOT1). Controller Switchover is initiated and immnd is killed on SC-1. SC-1 went for reboot because of the csi set callback timeout of logd. /var/log/messages of SC-1 and SC-2 corresponding to the above mentioned steps : SC-2: Dec 19 17:21:36 SLES-64BIT-SLOT2 osafamfd[3609]: NO safSi=SC-2N,safApp=OpenSAF Swap initiated Dec 19 17:21:36 SLES-64BIT-SLOT2 osafamfnd[3619]: NO Assigning 'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-2,safSg=2N,safApp=OpenSAF' Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO Implementer disconnected 18 <320, 2020f> (safMsgGrpService) Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO implementer for class 'SaSmfCampaign' is released => class extent is UNSAFE Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO Implementer disconnected 22 <319, 2020f> (safEvtService) Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO Implementer disconnected 23 <3, 2020f> (safLogService) Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO implementer for class 'OpenSafSmfConfig' is released => class extent is UNSAFE Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO implementer for class 'SaSmfSwBundle' is released => class extent is UNSAFE Dec 19 17:21:36 SLES-64BIT-SLOT2 osafimmnd[3554]: NO Implementer disconnected 24 <298, 2020f> (safSmfService) Dec 19 17:21:37 SLES-64BIT-SLOT2 osafimmnd[3554]: NO IDec 19 17:21:38 SC-1: SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 18 <0, 2020f> (safMsgGrpService) Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO implementer for class 'SaSmfCampaign' is released => class extent is UNSAFE Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 22 <0, 2020f> (safEvtService) Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 23 <0, 2020f> (safLogService) Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO implementer for class 'OpenSafSmfConfig' is released => class extent is UNSAFE Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO implementer for class 'SaSmfSwBundle' is released => class extent is UNSAFE Dec 19 17:21:38 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 24 <0, 2020f> (safSmfService) Dec 19 17:21:39 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 20 <0, 2020f> (safLckService) Dec 19 17:21:39 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 19 <0, 2020f> (safCheckPointService) Dec 19 17:21:39 SLES-64BIT-SLOT1 osafimmnd[3498]: NO Implementer disconnected 21 <0, 2020f> (safClmService) Dec 19 17:21:39 SLES-64BIT-SLOT1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting Dec 19 17:21:39 SLES-64BIT-SLOT1 osafamfnd[3578]: NO 'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Dec 19 17:21:39 SLES-64BIT-SLOT1 osafntfimcnd[3829]: ER saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Dec 19 17:21:39 SLES-64BIT-SLOT1 osafamfd[3565]: NO Re-initializing with IMM Dec 19 17:21:39 SLES-64BIT-SLOT1 osafimmd[3488]: NO IMMND coord at 2020f mplementer disconnected 20 <303, 2020f> (safLckService) .. Dec 19 17:21:49 SLES-64BIT-SLOT1 osafimmnd[3953]: NO Implementer connected: 40 (OpenSafImmPBE) <0, 2020f> Dec 19 17:21:49 SLES-64BIT-SLOT1 osafamfd[3565]: NO Finished re-initializing with IMM Dec 19 17:21:50 SLES-64BIT-SLOT1 osafimmnd[3953]: NO PBE-OI established on other SC. Dumping incrementally to file imm.db Dec 19 17:23:40 SLES-64BIT-SLOT1 osafamfnd[3578]: NO 'safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackTimeout' : Recovery is 'nodeFailfast' Dec 19 17:23:40 SLES-64BIT-SLOT1 osafamfnd[3578]: ER safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackTimeout Recovery is:nodeFailfast Dec 19 17:23:40 SLES-64BIT-SLOT1 osafamfnd[3578]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Dec 19 17:23:40 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60 When LOGD trace is examined there is no information at that point of time for the
[tickets] [opensaf:tickets] #665 java: Missing calls to ReleaseIntArrayElements
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#665] java: Missing calls to ReleaseIntArrayElements** **Status:** accepted **Milestone:** 4.6.2 **Created:** Tue Dec 17, 2013 12:22 PM UTC by Anders Widell **Last Updated:** Wed Jul 15, 2015 01:45 PM UTC **Owner:** Anders Widell In the file j_ais_socketUtil.c there are calls to GetIntArrayElements(), but no corresponding calls to ReleaseIntArrayElements(). Because of this, the garbage collector may not be able to reclaim the memory. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #648 NTF IMCN: Reinitialize IMM API if OiImplementer set timeout
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#648] NTF IMCN: Reinitialize IMM API if OiImplementer set timeout** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Wed Dec 04, 2013 02:26 PM UTC by elunlen **Last Updated:** Tue Sep 15, 2015 07:01 AM UTC **Owner:** nobody In imcn init: If ERR_EXIST re-initialize IMM API before changing name. If OiImplementerSet API timeout also re-initialize --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #689 rollback of campaign fails due to object not found
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#689] rollback of campaign fails due to object not found** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri Dec 20, 2013 01:16 PM UTC by surender khetavath **Last Updated:** Wed Jul 15, 2015 01:43 PM UTC **Owner:** nobody **Attachments:** - [sc1_logs.tgz](https://sourceforge.net/p/opensaf/tickets/689/attachment/sc1_logs.tgz) (22.3 MB; application/x-compressed-tar) changeset : 4733 model : 2n configuration : 1SG,5SUs,5SIs,SU1 to SU5 has 3comps each. 3CSIs in each SI si-si deps configured as SI1<-SI2<-SI3<-SI4 SaAmfCSIAttribute is set for all the CSIs. All SIs are initially in locked state and SUs are in lock-in. SU1 mapped to SC-1 SU2 mapped to SC-2 SU3-mapped to PL-3 SU4-SU5 mapped to PL-4 Test: A campaign is modelled to include one more SG with 2SUs having one component in each SU and 2SIs with 1 CSI in each SI Rollback fails with below error- ERR_NOT_EXIST. But actually the object exists osafsmfd log shows Dec 20 18:37:19.382514 osafsmfd [20589:SmfUpgradeAction.cc:0584] ER SmfImmCcbAction::rollback failed to rollback CCB smfRollbackElement=ccb_0002,smfRollbackElement=ProcInit,safSmfProc=AddNewSG,safSmfCampaign=Campaign_4,safApp=safSmfService, rc=SA_AIS_ERR_NOT_EXIST (12) immlist of object : immlist smfRollbackElement=ccb_0002,smfRollbackElement=ProcInit,safSmfProc=AddNewSG,safSmfCampaign=Campaign_4,safApp=safSmfService Name Type Value(s) smfRollbackElement SA_STRING_T smfRollbackElement=ccb_0002 SaImmAttrImplementerName SA_STRING_T safSmfProc=AddNewSG SaImmAttrClassName SA_STRING_T OpenSafSmfRollbackElement SaImmAttrAdminOwnerNameSA_STRING_T /var/log/messages show Dec 20 18:37:19 SC-1 osafsmfd[20589]: NO PROC: Rollback of procedure init actions Dec 20 18:37:19 SC-1 osafsmfd[20589]: NO Execution of IMM operation failed, rc=SA_AIS_ERR_NOT_EXIST (12) Dec 20 18:37:19 SC-1 osafsmfd[20589]: ER Rollback ccb operations failed for smfRollbackElement=ccb_0002,smfRollbackElement=ProcInit,safSmfProc=AddNewSG,safSmfCampaign=Campaign_4,safApp=safSmfService, rc=SA_AIS_ERR_NOT_EXIST (12) Dec 20 18:37:19 SC-1 osafsmfd[20589]: ER SmfImmCcbAction::rollback failed to rollback CCB smfRollbackElement=ccb_0002,smfRollbackElement=ProcInit,safSmfProc=AddNewSG,safSmfCampaign=Campaign_4,safApp=safSmfService, rc=SA_AIS_ERR_NOT_EXIST (12) Dec 20 18:37:19 SC-1 osafsmfd[20589]: NO SmfProcStateExecuting::rollbackInit: rollback of init action 2 failed, rc=SA_AIS_ERR_NOT_EXIST (12) Dec 20 18:37:19 SC-1 osafsmfd[20589]: NO CAMP: Procedure safSmfProc=AddNewSG returned ROLLBACKFAILED aign=Campaign_4,safApp=safSmfService,02,smfRollbackElement=ProcInit,safSmfProc=AddNewSG,safSmfCamp --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #326 amf: proxied SU's presence state hangs at INSTANTIATING state.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#326] amf: proxied SU's presence state hangs at INSTANTIATING state.** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Fri May 24, 2013 09:34 AM UTC by Praveen **Last Updated:** Thu Aug 06, 2015 10:26 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/2213. setup: 1 controller Model observed: TwoN Configuration of proxy : 1 App, 1SG, 1SU, 1 proxy comps Configuration of proxied : 1App, 1SG, 1SU, 1 proxied component with saAmfCtCompCategory=12 The proxy code is modelled to respond to amf with ERR_FAILED_OP inside SaAmfProxiedComponentInstantiateCallback?() api By default, the SU's of proxy and proxied are in locked-instantiation state. Scenario: Bringup the proxy and proxied configuration. Do unlock-in and unlock of the proxy. The proxy should be up and running, and the proxied registration should be successful. Now do unlock-in of proxied SU. The below is the console output console text: amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5) Retrying again gives the below output. SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN (6) SLES11-SLOT-2:/home/surender/amf # amf-adm unlock-in safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: SA_AIS_ERR_TRY_AGAIN (6) /var/log/messages output for above op's: Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=Comp_nored' initialized with saAmfCtDefCallbackTimeout Oct 11 15:13:15 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State UNINSTANTIATED => INSTANTIATING Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp' Presence State INSTANTIATING => INSTANTIATED Oct 11 15:13:16 SLES11-SLOT-2 osafamfnd[3852]: saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=Comp_pxd_basetype' initialized with saAmfCtDefCallbackTimeout Oct 11 15:13:41 SLES11-SLOT-2 osafamfnd[3852]: 'safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App' Presence State UNINSTANTIATED => INSTANTIATING Oct 11 15:15:55 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going Oct 11 15:15:58 SLES11-SLOT-2 osafamfd[3711]: Admin operation is already going SU states of proxy and proxied: safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp saAmfSUAdminState=UNLOCKED(1) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATED(3) saAmfSUReadinessState=IN-SERVICE(2) safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App saAmfSUAdminState=LOCKED(2) saAmfSUOperState=ENABLED(1) saAmfSUPresenceState=INSTANTIATING(2) saAmfSUReadinessState=OUT-OF-SERVICE(1) Comp state of proxy and proxied: safComp=mycomp,safSu=SU_mycomp,safSg=SG_mycomp,safApp=mycompApp saAmfCompOperState=ENABLED(1) saAmfCompPresenceState=INSTANTIATED(3) saAmfCompReadinessState=IN-SERVICE(2) safComp=Comp_pxd,safSu=SU_pxd,safSg=SG_pxd,safApp=pxd_App saAmfCompOperState=DISABLED(2) saAmfCompPresenceState=INSTANTIATING(2) saAmfCompReadinessState=OUT-OF-SERVICE(1) Here the proxied comp is in DISABLED state, but its SU is in ENABLED state. Also the proxied comp waits in Instantiating state indefinitely. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #246 cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup.
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#246] cpsv: Section create fails with random return values when mulitple processes try to create sections in the same checkpoint 70 node setup. ** **Status:** assigned **Milestone:** 4.6.2 **Created:** Thu May 16, 2013 06:37 AM UTC by A V Mahesh (AVM) **Last Updated:** Mon Aug 10, 2015 07:25 AM UTC **Owner:** A V Mahesh (AVM) from http://devel.opensaf.org/ticket/2386 Changeset: 3065 Setup: 70 node SLES11 VM setup 2 applications per node are running on a 70 node setup. Collocated checkpoint is created. After active replica is set from one process, section create with section id as GENERATED_SECTION_ID is invoked from rest of the processes. But the section create fails with ERR_EXIST, ERR_TIMEOUT, ERR_TRY_AGAIN. /var/log/messages for the two controllers will be shared. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #239 cpsv : section create returns ERR_EXIST after few try agains on 70 node cluster
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#239] cpsv : section create returns ERR_EXIST after few try agains on 70 node cluster** **Status:** assigned **Milestone:** 4.6.2 **Created:** Thu May 16, 2013 06:19 AM UTC by A V Mahesh (AVM) **Last Updated:** Tue Aug 11, 2015 06:19 AM UTC **Owner:** A V Mahesh (AVM) >From http://devel.opensaf.org/ticket/3042 This is seen on 70 SLES VM setup. One checkpoint application runs on each node. 1) Checkpoint Application on active controller creates an asynchronous collocated checkpoint. The applications on other nodes open the same checkpoint 2) Replica is set active on active controller and section is created 3) Section create API returns TRY_AGAIN few times and returns ERR_EXIST. When application gets try again, the section should not be created in the checkpoint. This is always not reproducible. snippet from test journal: 520|0 15 00130961 1 21| FAILED : Section 11 created in active colloc ckpt 520|0 15 00130961 1 22| Return Value : SA_AIS_ERR_TRY_AGAIN 520|0 15 00130961 1 23| 520|0 15 00130961 1 24| Try again count : 8 520|0 15 00130961 1 25| 520|0 15 00130961 1 26| FAILED : Section 11 created in active colloc ckpt 520|0 15 00130961 1 27| Return Value : SA_AIS_ERR_EXIST Attaching CPD and CPND traces of both the controllers --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #272 checkpoint overwrite returns timeout when controllers are running with different compatible versions
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#272] checkpoint overwrite returns timeout when controllers are running with different compatible versions** **Status:** assigned **Milestone:** 4.6.2 **Created:** Fri May 17, 2013 11:40 AM UTC by Sirisha Alla **Last Updated:** Tue Aug 11, 2015 06:17 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tar.gz](https://sourceforge.net/p/opensaf/tickets/272/attachment/logs.tar.gz) (175.5 kB; application/x-gzip) The issue is seen on OEL6.4 TCP setup. Changeset being used is 4241 with patches 2794 and 3117. Active controller(SC-1) is running with 4.3 version while standby controller (SC-2) is running with cs3533(4.2.x) A non collocated checkpoint replica is created on Active controller. A section is created in the checkpoint. Write and Read APIs are successfull but overwrite API is returning timeout for 5 seconds after which application timesout and exits. No ckptnd and agent crashes observed. When the same application is run on SC-2, it runs without any error. Attaching the journal and the traces of ckptnd and ckptd on both the controllers. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #159 AVSv to handle NTF Send TRY_AGAIN scenarios
- **Milestone**: 4.5.2 --> 4.6.2 --- ** [tickets:#159] AVSv to handle NTF Send TRY_AGAIN scenarios** **Status:** unassigned **Milestone:** 4.6.2 **Created:** Tue May 14, 2013 04:08 AM UTC by Nagendra Kumar **Last Updated:** Fri Aug 07, 2015 10:20 AM UTC **Owner:** nobody Migrated from http://devel.opensaf.org/ticket/967 AVSv to handle NTF Send TRY_AGAIN scenarios. While analysing ticket #954(unstable test setup), there were a lot of notification send failures observed when ntf had returned try again. try again should be in place for AVSV notifications. Changed 3 years ago by mathi ¶ ■component changed from unknown to AvSv Changed 3 years ago by murthy ¶ ■milestone changed from PL 3.0.2 to 4.0.0-RC1 Changed 3 years ago by hafe ¶ ■priority changed from major to minor I haven't seen any need for TRY-AGAIN handling of the NTF interface in AMF. Since the NTF API is now only used from amfd, it is in total control of when it can use NTF. If NTF would be used from amfnd it would require TRY-AGAIN, but that is not the case now. Lowering this prio. Changed 2 years ago by jfournier ¶ ■milestone changed from 4.0.RC1 to 4.0.1 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #68 failover didnot succeed and cluster got reset due to MDS problems.
- **Milestone**: 4.5.2 --> never --- ** [tickets:#68] failover didnot succeed and cluster got reset due to MDS problems.** **Status:** not-reproducible **Milestone:** never **Created:** Sat May 11, 2013 05:22 PM UTC by surender khetavath **Last Updated:** Tue Sep 08, 2015 04:57 AM UTC **Owner:** A V Mahesh (AVM) **Attachments:** - [logs.tgz](https://sourceforge.net/p/opensaf/tickets/68/attachment/logs.tgz) (16.2 MB; application/x-compressed-tar) - [AppConfig-2N-68.xml](https://sourceforge.net/p/opensaf/tickets/68/attachment/AppConfig-2N-68.xml) (23.1 kB; text/xml) Changeset : 4241 with 2794&3117 patch Model : TwoN configuration: 1App,1SG,4SUs with 3comps each and 5SIs with 3CSIs each Transport : TCP/ipv6-linklocal PBE enabled. scenario: sc1 was active and sc2 standby. Active SU on Sc1 was shutdown and component was made to reject quiescing assignment. Component got restarted for 10times as compRestartMax=10 and then escalated to nodefailover following a suFailover. sc-2 didnot become active, and eventually rebooted. Thus causing a cluster reset. syslog on sc-1: -- May 11 21:24:49 sc-1 osafimmnd[4683]: WA Error code 2 returned for message type 21 - ignoring May 11 21:24:49 sc-1 osafamfnd[4790]: NO Received reboot order, ordering reboot now! May 11 21:24:49 sc-1 osafamfnd[4790]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Received reboot order May 11 21:24:49 sc-1 opensaf_reboot: Rebooting local node May 11 21:24:49 sc-1 osafimmnd[4683]: WA MESSAGE:5319 OUT OF ORDER my highest processed:5317, exiting May 11 21:24:49 sc-1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting May 11 21:24:49 sc-1 osafntfimcnd[4734]: ER saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) May 11 21:24:49 sc-1 osafimmd[4668]: WA IMMND coordinator at 2010f apparently crashed => electing new coord May 11 21:24:49 sc-1 osafimmd[4668]: ER Failed to find candidate for new IMMND coordinator May 11 21:24:49 sc-1 osafimmd[4668]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart May 11 21:24:49 sc-1 osafimmd[4668]: ER IMM RELOAD => ensure cluster restart by IMMD exit at both SCs, exiting syslog on sc-2: May 11 21:24:49 sc-2 osafimmd[3894]: WA IMMD not re-electing coord for switch-over (si-swap) coord at (2010f) May 11 21:24:49 sc-2 osafntfimcnd[3969]: NO exiting on signal 15 May 11 21:24:49 sc-2 osafsmfd[4052]: ER amf_active_state_handler oi activate FAILED May 11 21:24:49 sc-2 osafamfnd[4023]: NO 'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'csiSetcallbackFailed' : Recovery is 'nodeFailfast' May 11 21:24:49 sc-2 osafamfnd[4023]: ER safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:csiSetcallbackFailed Recovery is:nodeFailfast May 11 21:24:49 sc-2 osafamfnd[4023]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast May 11 21:24:49 sc-2 osafmsgd[4216]: ER mqd_imm_declare_implementer failed: err = 14 May 11 21:24:49 sc-2 osafckptd[4202]: ER cpd immOiImplmenterSet failed with err = 14 May 11 21:24:49 sc-2 opensaf_reboot: Rebooting local node --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #1157 MDS: IMMD coredumps in MDS BCAST send (TCP with MCAST_ADDR)
- **Milestone**: 4.5.0 --> never --- ** [tickets:#1157] MDS: IMMD coredumps in MDS BCAST send (TCP with MCAST_ADDR)** **Status:** duplicate **Milestone:** never **Created:** Tue Oct 07, 2014 12:57 AM UTC by Adrian Szwej **Last Updated:** Fri Oct 10, 2014 06:00 PM UTC **Owner:** nobody **Attachments:** - [immd.core](https://sourceforge.net/p/opensaf/tickets/1157/attachment/immd.core) (25.2 kB; application/octet-stream) Changeset: **4.6.M0 - 6009:b2ddaa23aae4** When starting ~50 linux containers IMMD coredumps resulting in cluster reset. Communication is TCP. dtmd.conf configuration is: DTM_SOCK_SND_RCV_BUF_SIZE=65536 DTM_CLUSTER_ID=1 DTM_NODE_IP=172.17.1.42 DTM_MCAST_ADDR=224.0.0.6 BatchSize reduced to 4096 opensafImm=opensafImm,safApp=safImmService Name Type Value(s) opensafImmSyncBatchSizeSA_UINT32_T 4096 (0x1000) When node PL-51 joins the cluster the following messages is seen in the syslog: Oct 6 00:35:57 SC-1 osafdtmd[1028]: NO Established contact with 'PL-51' Oct 6 00:35:57 SC-1 osafimmd[1063]: NO Extended intro from node 2330f Oct 6 00:35:57 SC-1 osafimmd[1063]: NO Node 2330f request sync sync-pid:79 epoch:0 Oct 6 00:35:58 SC-1 osafimmnd[1072]: NO Announce sync, epoch:292 Oct 6 00:35:58 SC-1 osafimmnd[1072]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Oct 6 00:35:58 SC-1 osafimmnd[1072]: NO NODE STATE-> IMM_NODE_R_AVAILABLE Oct 6 00:35:58 SC-1 osafimmd[1063]: NO Successfully announced sync. New ruling epoch:292 Oct 6 00:35:58 SC-1 osafimmloadd: NO Sync starting Oct 6 00:36:00 SC-1 osafimmd[1063]: MDTM unsent message is more!=200 Oct 6 00:36:00 SC-1 osafimmnd[1072]: WA Director Service in NOACTIVE state - fevs replies pending:9 fevs highest processed:20037 Oct 6 00:36:00 SC-1 osafamfnd[1143]: NO 'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Oct 6 00:36:00 SC-1 osafamfnd[1143]: ER safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Oct 6 00:36:00 SC-1 osafamfnd[1143]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 Oct 6 00:36:00 SC-1 opensaf_reboot: Rebooting local node; timeout=60 Oct 6 00:36:00 SC-1 osafimmnd[1072]: NO No IMMD service => cluster restart, exiting There is a coredump generated: core_1412555760.osafimmd.1063 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets